Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

Job Board

Companies

Radiant

Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

Full-Time 60000 - 80000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Ensure reliability and performance of high-performance computing infrastructure in a 24/7 support model.
Company: Join Radiant, a leader in advanced technology and innovation.
Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
Other info: Dynamic team environment with a focus on operational excellence.
Why this job: Work with cutting-edge GPU technologies and shape the future of computing.
Qualifications: Expertise in large-scale distributed systems and Linux performance tuning.

The predicted salary is between 60000 - 80000 £ per year.

Radiant in the United Kingdom is seeking a Senior Infrastructure Site Reliability Engineer, responsible for ensuring the reliability and performance of high-performance computing infrastructure. This role demands expertise in large-scale distributed systems and operational excellence within a 24/7 support model.

The ideal candidate will have extensive experience with GPU technologies, Linux systems, and performance tuning. Join us to work with advanced technology that influences next-generation compute environments.

Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability employer: Radiant

Radiant is an exceptional employer that fosters a culture of innovation and collaboration, making it an ideal place for professionals passionate about high-performance computing. With a commitment to employee growth, we offer continuous learning opportunities and a supportive environment that values work-life balance. Located in the vibrant UK tech scene, our team enjoys access to cutting-edge technology and the chance to contribute to transformative projects in the AI and GPU domains.

Contact Details:

Radiant Recruitment Team

View Radiant profile

StudySmarter Expert Advice🤫

We think this is how you could land Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

✨Tip Number 1

Network, network, network! Reach out to folks in the HPC and AI communities. Attend meetups or webinars, and don’t be shy about sliding into DMs on LinkedIn. You never know who might have the inside scoop on job openings!

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to GPU technologies and performance tuning. This gives potential employers a taste of what you can bring to the table.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of large-scale distributed systems. Practice common SRE scenarios and be ready to discuss how you’ve tackled reliability challenges in the past.

✨Tip Number 4

Apply through our website! We love seeing candidates who take the initiative. Tailor your application to highlight your experience with Linux systems and operational excellence, and let us know why you’re excited about working with cutting-edge technology.

We think you need these skills to ace Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

High-Performance Computing (HPC)

Site Reliability Engineering (SRE)

GPU Technologies

Linux Systems

Performance Tuning

Large-Scale Distributed Systems

Operational Excellence

24/7 Support Model

Some tips for your application 🫡

Tailor Your CV:Make sure your CV highlights your experience with GPU technologies and large-scale distributed systems. We want to see how your skills align with the role, so don’t be shy about showcasing your operational excellence!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you’re passionate about high-performance computing and how your background makes you the perfect fit for our 24/7 support model. Let us know what excites you about working with advanced technology.

Showcase Relevant Projects:If you've worked on any projects related to HPC or AI infrastructure, make sure to mention them! We love seeing real-world applications of your skills, so include any performance tuning or reliability improvements you've achieved.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates. Plus, we can’t wait to see what you bring to the table!

How to prepare for a job interview at Radiant

✨Know Your Tech Inside Out

Make sure you brush up on your knowledge of GPU technologies and Linux systems. Be prepared to discuss specific projects where you've implemented performance tuning or managed large-scale distributed systems. This will show that you not only understand the theory but have practical experience too.

✨Demonstrate Operational Excellence

Since this role involves a 24/7 support model, be ready to talk about how you've handled incidents in high-pressure situations. Share examples of how you ensured reliability and performance in previous roles, and highlight any processes you put in place to improve operational efficiency.

✨Show Your Problem-Solving Skills

Prepare for scenario-based questions where you might need to troubleshoot a hypothetical issue with the HPC infrastructure. Think through your approach to diagnosing problems and how you would communicate solutions to both technical and non-technical stakeholders.

✨Cultural Fit Matters

Radiant is looking for someone who can thrive in their environment. Research their company culture and values, and think about how your personal values align with theirs. Be ready to discuss how you can contribute positively to their team dynamic.

Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

Radiant

Apply Now

Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

At a Glance

Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability employer: Radiant

StudySmarter Expert Advice🤫

We think you need these skills to ace Senior HPC/AI Infra SRE — 24/7 GPU Compute Reliability

Some tips for your application 🫡

How to prepare for a job interview at Radiant

Company

Product

Help