At a Glance
- Tasks: Build and scale distributed training systems for cutting-edge AI models.
- Company: Join a mission-driven team from top AI companies like DeepMind and OpenAI.
- Benefits: Top-tier salary, comprehensive health benefits, and generous parental leave.
- Other info: Collaborative environment with daily meals and regular team celebrations.
- Why this job: Make a real impact in AI by developing open superintelligence technologies.
- Qualifications: Experience with distributed training systems and modern ML frameworks.
The predicted salary is between 80000 - 100000 € per year.
Our Mission Reflection’s mission is to build open superintelligence and make it accessible to all. We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.
About The Role
- Build and scale distributed training systems that power frontier model pre-training.
- Work closely with research teams to design and operate large-scale training runs for foundation models.
- Develop infrastructure that enables efficient training across thousands of GPUs using modern distributed training frameworks.
- Optimize training throughput, stability, and efficiency for large model training workloads.
- Collaborate directly with pre-training researchers to translate experimental ideas into scalable, production-ready training systems.
- Improve performance of distributed training workloads through optimization of communication, memory usage, and GPU utilization.
- Build and maintain training pipelines that support large-scale datasets, checkpointing, and experiment iteration.
- Debug and resolve performance bottlenecks across distributed training stacks including model parallelism, GPU communication, and training runtime systems.
- Contribute to the development of systems that enable rapid experimentation and iteration on new training techniques.
Ideal Experience
- Experience building or operating distributed training systems for large machine learning models.
- Strong experience working with modern distributed training frameworks such as Megatron, DeepSpeed, or similar large-scale training systems.
- Familiarity with large-scale model parallelism strategies (data, tensor, pipeline, or expert parallelism).
- Experience optimizing training throughput and GPU utilization in large distributed environments.
- Familiarity with GPU communication libraries such as NCCL and performance tuning for distributed workloads.
- Experience working closely with ML researchers to productionize experimental training workflows.
- Strong debugging skills across GPU compute, distributed training systems, and large-scale ML pipelines.
- Experience working with large datasets and training pipelines used for foundation model pre-training.
What We Offer
- Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
- Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
- Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
- Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
- Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.
Member of Technical Staff - Pre-Training Infra employer: Reflection
At Reflection, we are committed to fostering a dynamic and inclusive work environment where innovation thrives. As a member of our Technical Staff, you will be part of a small, highly skilled team dedicated to pioneering open superintelligence, with access to top-tier compensation, comprehensive health benefits, and generous parental leave. Our culture prioritises collaboration and personal growth, ensuring that you can make a meaningful impact while enjoying a supportive work-life balance in a cutting-edge field.
StudySmarter Expert Advice🤫
We think this is how you could land Member of Technical Staff - Pre-Training Infra
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those who work at companies like Reflection. Use LinkedIn or even Twitter to connect with them and ask for insights about their work and the hiring process.
✨Tip Number 2
Show off your skills! If you’ve got a GitHub or portfolio showcasing your projects, make sure to highlight that when you chat with potential employers. It’s a great way to demonstrate your expertise in distributed training systems and large-scale ML pipelines.
✨Tip Number 3
Prepare for technical interviews by brushing up on your debugging skills and understanding of GPU communication libraries. Practice explaining your past experiences with distributed training frameworks like Megatron or DeepSpeed, as this will help you stand out.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in being part of our mission to build open superintelligence.
We think you need these skills to ace Member of Technical Staff - Pre-Training Infra
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter to highlight your experience with distributed training systems and large-scale model optimisation. We want to see how your skills align with our mission of building open superintelligence!
Showcase Relevant Experience:When detailing your past roles, focus on your hands-on experience with frameworks like Megatron or DeepSpeed. We love seeing specific examples of how you've tackled challenges in distributed environments, so don’t hold back!
Be Clear and Concise:Keep your application straightforward and to the point. Use clear language to describe your achievements and avoid jargon unless it’s relevant to the role. We appreciate clarity as much as we appreciate technical prowess!
Apply Through Our Website:We encourage you to submit your application directly through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at Reflection
✨Know Your Tech Inside Out
Make sure you’re well-versed in distributed training systems and the frameworks mentioned in the job description, like Megatron and DeepSpeed. Brush up on your knowledge of GPU communication libraries such as NCCL, as this will show your technical prowess and readiness to tackle the role.
✨Showcase Your Collaboration Skills
Since the role involves working closely with researchers, be prepared to discuss past experiences where you collaborated on projects. Highlight how you translated experimental ideas into practical solutions, as this will demonstrate your ability to work effectively within a team.
✨Prepare for Problem-Solving Questions
Expect to face questions about debugging and resolving performance bottlenecks. Think of specific examples from your experience where you optimised training throughput or improved GPU utilisation. This will help you illustrate your problem-solving skills and technical expertise.
✨Understand the Company’s Mission
Familiarise yourself with Reflection’s mission to build open superintelligence. Be ready to discuss how your skills and experiences align with their goals. Showing that you’re passionate about their vision can set you apart from other candidates.