At a Glance
- Tasks: Manage high-performance GPU clusters and ensure system reliability for cutting-edge AI projects.
- Company: Exciting AI firm with a team from top tech companies, expanding in London.
- Benefits: Competitive salary, flexible work environment, and opportunities for professional growth.
- Other info: Be part of a dynamic team driving innovation in AI infrastructure.
- Why this job: Join a pioneering team and shape the future of AI technology with your expertise.
- Qualifications: 6 years in high-performance fields and experience with large GPU clusters.
The predicted salary is between 80000 - 100000 £ per year.
We are recruiting for a young AI firm that has sprung out of the US but is growing in London. The team of engineers and researchers come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic etc. They are looking for a Senior Systems Engineer to focus on cluster management, platform engineering that handles a high number of GPUs (their range currently is in the 20k-40k), monitoring/reliability and work on infrastructure for next-generation GPU deployments.
Requirements:
- 6 years experience in a high performance field like AI, big tech, or quantitative trading
- Experience of working on clusters of 1000 GPUs or higher
- Experience of driving key projects in your team or business
Site Reliability Engineer, GPUs in AI employer: Radley James
Contact Detail:
Radley James Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer, GPUs in AI
✨Tip Number 1
Network like a pro! Reach out to folks in the AI and tech scene, especially those who’ve worked at places like DeepMind or OpenAI. A friendly chat can open doors that a CV just can’t.
✨Tip Number 2
Show off your skills! If you’ve got experience with GPU clusters, make sure to highlight specific projects you’ve led. Use real examples to demonstrate how you’ve tackled challenges in high-performance environments.
✨Tip Number 3
Don’t just apply – engage! When you find a role that excites you, apply through our website and follow up with a message. Let them know you’re keen and why you’d be a great fit for their team.
✨Tip Number 4
Prepare for the tech interview! Brush up on your knowledge of cluster management and reliability engineering. Be ready to discuss how you’d handle scaling and monitoring in a high-GPU environment.
We think you need these skills to ace Site Reliability Engineer, GPUs in AI
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the role of Site Reliability Engineer. Highlight your experience with GPU clusters and any relevant projects you've led. We want to see how your background aligns with our needs!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI and how your skills can contribute to our team. Keep it engaging and personal – we love getting to know you!
Showcase Relevant Experience: When detailing your experience, focus on your work with high-performance systems and GPU management. We’re looking for specific examples that demonstrate your expertise in handling large clusters and driving key projects.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates from our team. Let’s get started!
How to prepare for a job interview at Radley James
✨Know Your GPUs
Make sure you brush up on your knowledge of GPU architecture and performance metrics. Be ready to discuss your experience with clusters of 1000 GPUs or more, as well as any specific projects you've led that involved high-performance computing.
✨Showcase Your Project Management Skills
Prepare to talk about key projects you've driven in your previous roles. Highlight your leadership skills and how you managed challenges, especially in high-pressure environments like AI or big tech. Use the STAR method (Situation, Task, Action, Result) to structure your answers.
✨Understand Cluster Management
Familiarise yourself with cluster management tools and techniques. Be ready to discuss your hands-on experience with monitoring and reliability practices, and how you've ensured uptime and performance in previous roles.
✨Cultural Fit Matters
This young AI firm values innovation and collaboration. Research their culture and be prepared to explain how your values align with theirs. Share examples of how you've worked effectively in diverse teams, especially in fast-paced environments.