Job Board

Companies

Radley James

Site Reliability Engineer, GPUs in AI

Full-Time 80000 - 100000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Manage high-performance GPU clusters and ensure system reliability for cutting-edge AI projects.
Company: Exciting AI firm with a team from top tech companies, expanding in London.
Benefits: Competitive salary, flexible work environment, and opportunities for professional growth.
Other info: Be part of a dynamic team driving innovation in AI infrastructure.
Why this job: Join a pioneering team and shape the future of AI technology with your expertise.
Qualifications: 6 years in high-performance fields and experience with large GPU clusters.

The predicted salary is between 80000 - 100000 £ per year.

We are recruiting for a young AI firm that has sprung out of the US but is growing in London. The team of engineers and researchers come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic etc. They are looking for a Senior Systems Engineer to focus on cluster management, platform engineering that handles a high number of GPUs (their range currently is in the 20k-40k), monitoring/reliability and work on infrastructure for next-generation GPU deployments.

Requirements:

6 years experience in a high performance field like AI, big tech, or quantitative trading
Experience of working on clusters of 1000 GPUs or higher
Experience of driving key projects in your team or business

Site Reliability Engineer, GPUs in AI employer: Radley James

Join a dynamic and innovative AI firm that is rapidly expanding in London, where you'll collaborate with top-tier engineers and researchers from leading tech companies. We offer a vibrant work culture that fosters creativity and collaboration, alongside ample opportunities for professional growth in the cutting-edge field of AI. With access to state-of-the-art technology and a focus on next-generation GPU deployments, this role promises a meaningful and rewarding career path in a supportive environment.

Contact Detail:

Radley James Recruiting Team

View Radley James Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer, GPUs in AI

✨Tip Number 1

Network like a pro! Reach out to folks in the AI and tech scene, especially those who’ve worked at places like DeepMind or OpenAI. A friendly chat can open doors that a CV just can’t.

✨Tip Number 2

Show off your skills! If you’ve got experience with GPU clusters, make sure to highlight specific projects you’ve led. Use real examples to demonstrate how you’ve tackled challenges in high-performance environments.

✨Tip Number 3

Don’t just apply – engage! When you find a role that excites you, apply through our website and follow up with a message. Let them know you’re keen and why you’d be a great fit for their team.

✨Tip Number 4

Prepare for the tech interview! Brush up on your knowledge of cluster management and reliability engineering. Be ready to discuss how you’d handle scaling and monitoring in a high-GPU environment.

We think you need these skills to ace Site Reliability Engineer, GPUs in AI

Cluster Management

Platform Engineering

GPU Management

Monitoring and Reliability

Infrastructure Development

High-Performance Computing

Project Management

Experience with Large-Scale Deployments

AI Systems Knowledge

Collaboration Skills

Problem-Solving Skills

Technical Expertise in AI

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the role of Site Reliability Engineer. Highlight your experience with GPU clusters and any relevant projects you've led. We want to see how your background aligns with our needs!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI and how your skills can contribute to our team. Keep it engaging and personal – we love getting to know you!

Showcase Relevant Experience: When detailing your experience, focus on your work with high-performance systems and GPU management. We’re looking for specific examples that demonstrate your expertise in handling large clusters and driving key projects.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates from our team. Let’s get started!

How to prepare for a job interview at Radley James

✨Know Your GPUs

Make sure you brush up on your knowledge of GPU architecture and performance metrics. Be ready to discuss your experience with clusters of 1000 GPUs or more, as well as any specific projects you've led that involved high-performance computing.

✨Showcase Your Project Management Skills

Prepare to talk about key projects you've driven in your previous roles. Highlight your leadership skills and how you managed challenges, especially in high-pressure environments like AI or big tech. Use the STAR method (Situation, Task, Action, Result) to structure your answers.

✨Understand Cluster Management

Familiarise yourself with cluster management tools and techniques. Be ready to discuss your hands-on experience with monitoring and reliability practices, and how you've ensured uptime and performance in previous roles.

✨Cultural Fit Matters

This young AI firm values innovation and collaboration. Research their culture and be prepared to explain how your values align with theirs. Share examples of how you've worked effectively in diverse teams, especially in fast-paced environments.

Site Reliability Engineer, GPUs in AI

Radley James

Apply now

Site Reliability Engineer, GPUs in AI

At a Glance

Site Reliability Engineer, GPUs in AI employer: Radley James

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Site Reliability Engineer, GPUs in AI

Some tips for your application 🫡

How to prepare for a job interview at Radley James

Site Reliability Engineer, GPUs in AI

Land your dream job quicker with Premium