At a Glance
- Tasks: Build and operate large-scale data ingestion systems for AI training.
- Company: Join a cutting-edge AI company with a mission to democratise superintelligence.
- Benefits: Top-tier salary, comprehensive health benefits, and generous parental leave.
- Why this job: Make a real impact on AI innovation while collaborating with world-class researchers.
- Qualifications: Experience in web crawling and large-scale data systems; strong communication skills.
- Other info: Enjoy a dynamic work environment with daily meals and team celebrations.
The predicted salary is between 36000 - 60000 £ per year.
Overview
Reflection’s mission is to build open superintelligence and make it accessible to all. We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.
About The Role
Data is playing an increasingly crucial role at the frontier of AI innovation. Many of the most meaningful advances in recent years have come not from new architectures, but from better data. As a member of the Data Team, your mission is to build and operate the ingestion systems that turn the open web and other large-scale data sources into reliable, well-structured corpora for training frontier models. You will own the machinery that acquires, extracts, normalizes, versions, and delivers data to our pre-training pipelines. You’ll work directly with world-class researchers to close the loop between what we collect and how it impacts model performance.
This role is ideal for engineers who love building robust distributed systems, but who also want to run experiments, reason about tradeoffs in data acquisition, and iterate quickly based on measurable impact.
Working closely with our pre-training and data quality teams, you will:
- Build and operate large-scale data ingestion systems for pre-training, including web crawling, extraction, and dataset delivery
- Run experiments to evaluate crawling strategies, extraction methods, and ingestion tradeoffs
- Analyze ingested data to identify gaps, redundancy, and areas to improve
- Build ingestion pipelines that scale reliably across large data campaigns
- Develop specialized crawlers for high-priority data sources
- Review code, debug production issues, and continuously improve ingestion infrastructure
About You
- Curious about how training data influences model capabilities, and can iterate quickly based on measurable downstream impact
- Able to collaborate tightly across functions: researchers, infra, operations, and external partners
- Enjoy working in a hybrid research–engineering role
Skills And Qualifications
- Experience building web crawling, data ingestion, or large-scale data acquisition systems using Ray, Beam, Spark, or similar technologies
- Familiarity with how LLMs are trained and evaluated, and an intuition for what makes data useful for training
- Comfortable working with very large datasets (multi-TB to PB scale) and building systems that are observable, testable, and maintainable
- Comfortable designing experiments and using data to guide system improvements
- Excellent communication skills. You can explain system behavior. You consider and communicate tradeoffs clearly
What We Offer
- Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally
- Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance
- Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning
- Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time
- Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.
Member of Technical Staff - Data Ingestion Engineer employer: Reflection AI
Contact Detail:
Reflection AI Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Member of Technical Staff - Data Ingestion Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those who work at companies you're interested in. A friendly chat can open doors and give you insights that job descriptions just can't.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repo showcasing your projects related to data ingestion or web crawling. This gives potential employers a taste of what you can do and sets you apart from the crowd.
✨Tip Number 3
Prepare for interviews by practising common technical questions and scenarios related to data systems. We recommend doing mock interviews with friends or using online platforms to get comfortable with the format.
✨Tip Number 4
Don't forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining our team!
We think you need these skills to ace Member of Technical Staff - Data Ingestion Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the role of Data Ingestion Engineer. Highlight any experience you have with web crawling, data ingestion, or large-scale data systems. We want to see how your background fits into our mission!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about data and AI. Share specific examples of projects you've worked on that relate to the job. We love seeing your personality come through!
Showcase Your Technical Skills: Don’t forget to mention your experience with technologies like Ray, Beam, or Spark. If you've built any systems or run experiments, let us know! We’re looking for engineers who can hit the ground running.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates. Plus, we love seeing applications come in through our own platform!
How to prepare for a job interview at Reflection AI
✨Know Your Data Ingestion Systems
Make sure you brush up on your knowledge of web crawling and data ingestion systems. Be ready to discuss your experience with technologies like Ray, Beam, or Spark, and how you've built or improved these systems in the past.
✨Show Your Curiosity
Demonstrate your curiosity about how training data influences model capabilities. Prepare examples of how you've iterated on projects based on measurable impact, and be ready to discuss any experiments you've run to evaluate different strategies.
✨Collaboration is Key
Highlight your ability to collaborate across functions. Think of specific instances where you've worked closely with researchers, operations, or external partners, and be prepared to explain how those collaborations led to successful outcomes.
✨Communicate Clearly
Practice explaining complex system behaviours in simple terms. Be ready to discuss trade-offs in your previous projects and how you communicated these to your team. Clear communication can set you apart from other candidates.