Job Board

Companies

Radley James

Site Reliability Engineer, GPUs in AI

Site Reliability Engineer, GPUs in AI in London

London Full-Time 80000 - 100000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Manage and optimise GPU clusters for cutting-edge AI projects.
Company: Exciting AI firm with a team from top tech companies.
Benefits: Competitive salary, flexible working, and opportunities for growth.
Other info: Dynamic work environment with a focus on innovation and collaboration.
Why this job: Join a pioneering team and shape the future of AI technology.
Qualifications: 6 years in high-performance fields and experience with large GPU clusters.

The predicted salary is between 80000 - 100000 £ per year.

We are recruiting for a young AI firm that has sprung out of the US but is growing in London. The team of engineers and researchers come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic etc. They are looking for a Senior Systems Engineer to focus on cluster management, platform engineering that handles a high number of GPUs (their range currently is in the 20k-40k), monitoring/reliability and work on infrastructure for next-generation GPU deployments.

Requirements:

6 years experience in a high performance field like AI, big tech, or quantitative trading
Experience of working on clusters of 1000 GPUs or higher
Experience of driving key projects in your team or business

Site Reliability Engineer, GPUs in AI in London employer: Radley James

Join a dynamic and innovative AI firm that is rapidly expanding in London, where you'll collaborate with top-tier engineers and researchers from leading tech companies. We offer a vibrant work culture that fosters creativity and collaboration, alongside ample opportunities for professional growth in the cutting-edge field of AI. With access to state-of-the-art technology and a focus on next-generation GPU deployments, this role promises a meaningful and rewarding career path in a supportive environment.

Contact Detail:

Radley James Recruiting Team

View Radley James Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer, GPUs in AI in London

✨Tip Number 1

Network like a pro! Reach out to folks in the AI and tech scene, especially those who've worked at places like DeepMind or OpenAI. A friendly chat can open doors and give you insider info on job openings.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing your projects related to cluster management and GPU deployments. This gives potential employers a taste of what you can bring to the table.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of high-performance computing and reliability engineering. Practice common interview questions and scenarios that focus on managing large GPU clusters.

✨Tip Number 4

Don't forget to apply through our website! We’ve got loads of opportunities, and applying directly can sometimes give you an edge. Plus, it’s super easy to keep track of your applications!

We think you need these skills to ace Site Reliability Engineer, GPUs in AI in London

Cluster Management

Platform Engineering

GPU Management

Monitoring and Reliability

Infrastructure Development

High-Performance Computing

Project Management

Experience with Large-Scale Deployments

AI Systems Knowledge

Collaboration Skills

Problem-Solving Skills

Technical Expertise in AI

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the role of Site Reliability Engineer. Highlight your experience with GPU clusters and any relevant projects you've led. We want to see how your background aligns with our needs!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI and how your skills can contribute to our team. Keep it engaging and personal – we love to see your personality!

Showcase Relevant Experience: When detailing your experience, focus on your work with high-performance systems and GPU management. Mention specific projects or achievements that demonstrate your expertise in handling large clusters.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy – just a few clicks!

How to prepare for a job interview at Radley James

✨Know Your GPUs

Make sure you brush up on your knowledge of GPU architecture and performance metrics. Be ready to discuss your experience with clusters of 1000 GPUs or more, as this will be a key focus for the role.

✨Showcase Your Project Experience

Prepare to talk about specific projects where you've driven key initiatives. Highlight your contributions to cluster management and platform engineering, especially in high-performance environments like AI or big tech.

✨Understand the Company Culture

Research the company’s background and its founders from DeepMind, OpenAI, and others. Understanding their mission and values will help you align your answers with what they’re looking for in a candidate.

✨Ask Insightful Questions

Prepare thoughtful questions about their current infrastructure challenges and future GPU deployments. This shows your genuine interest in the role and helps you gauge if the company is the right fit for you.

Site Reliability Engineer, GPUs in AI in London

Radley James

Location: London

Apply now

Site Reliability Engineer, GPUs in AI in London

At a Glance

Site Reliability Engineer, GPUs in AI in London employer: Radley James

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Site Reliability Engineer, GPUs in AI in London

Some tips for your application 🫡

How to prepare for a job interview at Radley James

Site Reliability Engineer, GPUs in AI in London

Land your dream job quicker with Premium