At a Glance
- Tasks: Build and scale distributed training systems for cutting-edge AI models.
- Company: Join a mission-driven team of AI experts from top tech companies.
- Benefits: Top-tier salary, comprehensive health benefits, and generous parental leave.
- Other info: Dynamic team culture with daily meals and regular team celebrations.
- Why this job: Make a real impact in the future of open superintelligence.
- Qualifications: Experience with distributed training systems and modern ML frameworks.
The predicted salary is between 80000 - 100000 € per year.
Our Mission Reflection’s mission is to build open superintelligence and make it accessible to all. We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.
About The Role
- Build and scale distributed training systems that power frontier model pre-training.
- Work closely with research teams to design and operate large-scale training runs for foundation models.
- Develop infrastructure that enables efficient training across thousands of GPUs using modern distributed training frameworks.
- Optimize training throughput, stability, and efficiency for large model training workloads.
- Collaborate directly with pre-training researchers to translate experimental ideas into scalable, production-ready training systems.
- Improve performance of distributed training workloads through optimization of communication, memory usage, and GPU utilization.
- Build and maintain training pipelines that support large-scale datasets, checkpointing, and experiment iteration.
- Debug and resolve performance bottlenecks across distributed training stacks including model parallelism, GPU communication, and training runtime systems.
- Contribute to the development of systems that enable rapid experimentation and iteration on new training techniques.
Ideal Experience
- Experience building or operating distributed training systems for large machine learning models.
- Strong experience working with modern distributed training frameworks such as Megatron, DeepSpeed, or similar large-scale training systems.
- Familiarity with large-scale model parallelism strategies (data, tensor, pipeline, or expert parallelism).
- Experience optimizing training throughput and GPU utilization in large distributed environments.
- Familiarity with GPU communication libraries such as NCCL and performance tuning for distributed workloads.
- Experience working closely with ML researchers to productionize experimental training workflows.
- Strong debugging skills across GPU compute, distributed training systems, and large-scale ML pipelines.
- Experience working with large datasets and training pipelines used for foundation model pre-training.
What We Offer
- Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
- Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
- Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
- Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
- Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.
Member of Technical Staff - Pre-Training Infra in London employer: Reflection
At Reflection, we are committed to fostering a dynamic and inclusive work environment where innovation thrives. As a member of our Technical Staff, you will be part of a small, highly skilled team dedicated to pioneering open superintelligence, with access to top-tier compensation, comprehensive health benefits, and generous parental leave. Our culture prioritises collaboration and personal growth, ensuring that you can make a meaningful impact while enjoying a supportive work-life balance in a cutting-edge field.
StudySmarter Expert Advice🤫
We think this is how you could land Member of Technical Staff - Pre-Training Infra in London
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those who work at companies like Reflection. Use LinkedIn or even Twitter to connect with them and ask for insights about their work and the hiring process.
✨Tip Number 2
Show off your skills! If you’ve got a GitHub or portfolio showcasing your projects, make sure to highlight that when you chat with potential employers. It’s a great way to demonstrate your expertise in distributed training systems and large-scale ML pipelines.
✨Tip Number 3
Prepare for technical interviews by brushing up on your debugging skills and understanding of GPU communication libraries. Practice explaining your past experiences with distributed training frameworks like Megatron or DeepSpeed, as this will help you stand out.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in being part of our mission to build open superintelligence.
We think you need these skills to ace Member of Technical Staff - Pre-Training Infra in London
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter for the role. Highlight your experience with distributed training systems and any relevant frameworks like Megatron or DeepSpeed. We want to see how your skills align with our mission!
Showcase Your Projects:Include specific examples of projects you've worked on that relate to large-scale model training. If you've optimised GPU utilisation or tackled performance bottlenecks, let us know! This helps us understand your hands-on experience.
Be Clear and Concise:When writing your application, keep it straightforward. Use clear language and avoid jargon unless it's necessary. We appreciate a well-structured application that gets straight to the point!
Apply Through Our Website:We encourage you to submit your application through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it’s super easy to do!
How to prepare for a job interview at Reflection
✨Know Your Tech Inside Out
Make sure you’re well-versed in distributed training systems and frameworks like Megatron or DeepSpeed. Brush up on your knowledge of GPU communication libraries such as NCCL, as well as model parallelism strategies. Being able to discuss these topics confidently will show that you’re not just familiar with the tech, but that you can also contribute meaningfully.
✨Showcase Your Problem-Solving Skills
Prepare to discuss specific examples where you've debugged performance bottlenecks or optimised training throughput. Use the STAR method (Situation, Task, Action, Result) to structure your answers. This will help you articulate your thought process and demonstrate your ability to tackle complex challenges in distributed training environments.
✨Collaborate Like a Pro
Since the role involves working closely with ML researchers, be ready to talk about your experience collaborating with cross-functional teams. Highlight any projects where you translated experimental ideas into scalable systems. This shows that you can bridge the gap between research and production, which is crucial for this position.
✨Ask Insightful Questions
Prepare thoughtful questions about the company’s approach to building open superintelligence and their vision for the future. Inquire about the challenges they face in scaling distributed training systems or how they foster collaboration within the team. This not only shows your interest in the role but also helps you gauge if the company aligns with your values.