Site Reliability Engineer, GPUs in AI in London
Site Reliability Engineer, GPUs in AI

Site Reliability Engineer, GPUs in AI in London

London Full-Time 80000 - 100000 £ / year (est.) No home office possible
Radley James

At a Glance

  • Tasks: Manage and optimise GPU clusters for cutting-edge AI projects.
  • Company: Exciting AI firm with a team from top tech companies.
  • Benefits: Competitive salary, flexible working, and opportunities for growth.
  • Other info: Dynamic work environment with a focus on innovation and collaboration.
  • Why this job: Join a pioneering team and shape the future of AI technology.
  • Qualifications: 6 years in high-performance fields and experience with large GPU clusters.

The predicted salary is between 80000 - 100000 £ per year.

We are recruiting for a young AI firm that has sprung out of the US but is growing in London. The team of engineers and researchers come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic etc. They are looking for a Senior Systems Engineer to focus on cluster management, platform engineering that handles a high number of GPUs (their range currently is in the 20k-40k), monitoring/reliability and work on infrastructure for next-generation GPU deployments.

Requirements:

  • 6 years experience in a high performance field like AI, big tech, or quantitative trading
  • Experience of working on clusters of 1000 GPUs or higher
  • Experience of driving key projects in your team or business

Site Reliability Engineer, GPUs in AI in London employer: Radley James

Join a dynamic and innovative AI firm that is rapidly expanding in London, where you'll collaborate with top-tier engineers and researchers from leading tech companies. We offer a vibrant work culture that fosters creativity and collaboration, alongside ample opportunities for professional growth in the cutting-edge field of AI. With access to state-of-the-art technology and a focus on next-generation GPU deployments, this role promises a meaningful and rewarding career path in a supportive environment.
Radley James

Contact Detail:

Radley James Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer, GPUs in AI in London

✨Tip Number 1

Network like a pro! Reach out to folks in the AI and tech scene, especially those who've worked at places like DeepMind or OpenAI. A friendly chat can open doors and give you insider info on job openings.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing your projects related to cluster management and GPU deployments. This gives potential employers a taste of what you can bring to the table.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of high-performance computing and reliability engineering. Practice common interview questions and scenarios that focus on managing large GPU clusters.

✨Tip Number 4

Don't forget to apply through our website! We’ve got loads of opportunities, and applying directly can sometimes give you an edge. Plus, it’s super easy to keep track of your applications!

We think you need these skills to ace Site Reliability Engineer, GPUs in AI in London

Cluster Management
Platform Engineering
GPU Management
Monitoring and Reliability
Infrastructure Development
High-Performance Computing
Project Management
Experience with Large-Scale Deployments
AI Systems Knowledge
Collaboration Skills
Problem-Solving Skills
Technical Expertise in AI

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the role of Site Reliability Engineer. Highlight your experience with GPU clusters and any relevant projects you've led. We want to see how your background aligns with our needs!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI and how your skills can contribute to our team. Keep it engaging and personal – we love to see your personality!

Showcase Relevant Experience: When detailing your experience, focus on your work with high-performance systems and GPU management. Mention specific projects or achievements that demonstrate your expertise in handling large clusters.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy – just a few clicks!

How to prepare for a job interview at Radley James

✨Know Your GPUs

Make sure you brush up on your knowledge of GPU architecture and performance metrics. Be ready to discuss your experience with clusters of 1000 GPUs or more, as this will be a key focus for the role.

✨Showcase Your Project Experience

Prepare to talk about specific projects where you've driven key initiatives. Highlight your contributions to cluster management and platform engineering, especially in high-performance environments like AI or big tech.

✨Understand the Company Culture

Research the company’s background and its founders from DeepMind, OpenAI, and others. Understanding their mission and values will help you align your answers with what they’re looking for in a candidate.

✨Ask Insightful Questions

Prepare thoughtful questions about their current infrastructure challenges and future GPU deployments. This shows your genuine interest in the role and helps you gauge if the company is the right fit for you.

Site Reliability Engineer, GPUs in AI in London
Radley James
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>