Senior Site Reliability Engineer in London

Job Board

Companies

Realm

Senior Site Reliability Engineer

Senior Site Reliability Engineer in London

London Full-Time 70000 - 90000 € / year (est.) No home office possible

At a Glance

Tasks: Ensure reliability and performance of a global compute platform while collaborating across teams.
Company: High-growth infrastructure company focused on advanced machine learning workloads.
Benefits: Competitive salary, equity package, health coverage, and generous paid time off.
Other info: Opportunity for hands-on engineering in a dynamic, collaborative culture.
Why this job: Join a fast-paced environment and make a real impact on cutting-edge technology.
Qualifications: 5+ years in site reliability engineering or DevOps with strong communication skills.

The predicted salary is between 70000 - 90000 € per year.

High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. Fast-paced environment with emphasis on ownership, execution speed, and quality. Culture centred on pragmatic problem-solving, cross-functional collaboration, and full lifecycle responsibility.

Role Overview:

Position operating across software, infrastructure, and operations to ensure reliability, scalability, and performance of a globally distributed compute platform. Close collaboration with networking, platform engineering, and physical infrastructure teams to design and operate systems supporting high-demand computational workloads. Hands-on engineering role requiring strong systems expertise, with responsibility for resolving complex production issues, improving system resilience, and enhancing platform observability.

Responsibilities:

Deployment and management of large-scale compute clusters using automation tooling, with adaptation to customer requirements
Validation and optimisation of compute, storage, and networking systems in coordination with internal teams and vendors
Execution of large-scale data migrations between cloud and on-premise environments with focus on efficiency and cost
Troubleshooting across the full stack, including hardware, networking, and distributed systems
Development of internal tooling and automation to improve deployment speed, reliability, and operational efficiency
Participation in an on-call rotation required (approximately one week per month)

Key Attributes:

Strong ownership mindset with focus on delivery and accountability
Experience building maintainable, well-documented systems in complex environments
Ability to operate effectively in ambiguous and rapidly evolving contexts
Clear and effective communication skills with collaborative, low-ego approach

Minimum Requirements:

5+ years of experience in site reliability engineering, DevOps, systems administration, or high-performance computing
Strong written and verbal communication skills in English
Experience deploying and operating container orchestration or workload scheduling systems (e.g. Kubernetes or similar)
Programming or scripting experience in Go, Python, or Bash
Familiarity with infrastructure automation and infrastructure-as-code tools
Strong technical foundation in computing or related discipline

Preferred Experience:

Experience operating large-scale machine learning or AI-compute workloads
Background in multi-tenant distributed systems at scale
Hands-on experience with data centre or bare-metal infrastructure
Knowledge of high-performance networking technologies
Experience managing large-scale storage systems (commercial or open-source)

Compensation & Benefits:

Competitive salary and equity package
Retirement or pension contributions aligned with local standards
Health coverage including medical, dental, and vision
Generous paid time off policy

Senior Site Reliability Engineer in London employer: Realm

Join a high-growth infrastructure company that prioritises innovation and collaboration, offering a dynamic work environment where your contributions directly impact the success of advanced machine learning workloads. With a strong focus on employee growth, competitive compensation, and a culture that values pragmatic problem-solving, you will thrive in a role that encourages ownership and accountability while working alongside talented professionals in a fast-paced setting. Enjoy comprehensive health benefits, generous paid time off, and opportunities to develop your skills in a supportive atmosphere.

Contact Detail:

Realm Recruiting Team

View Realm Profile

StudySmarter Expert Advice🤫

We think this is how you could land Senior Site Reliability Engineer in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with potential colleagues on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to site reliability engineering or high-performance computing. This gives employers a taste of what you can do beyond your CV.

✨Tip Number 3

Prepare for interviews by brushing up on technical questions and real-world scenarios. Practice explaining complex concepts clearly and concisely, as communication is key in a collaborative environment like ours.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at StudySmarter.

We think you need these skills to ace Senior Site Reliability Engineer in London

Site Reliability Engineering

DevOps

Systems Administration

High-Performance Computing

Container Orchestration

Workload Scheduling Systems

Kubernetes

Programming in Go

Python

Bash

Infrastructure Automation

Infrastructure-as-Code Tools

Troubleshooting Distributed Systems

Data Centre Management

High-Performance Networking Technologies

Some tips for your application 🫡

Tailor Your CV:Make sure your CV reflects the skills and experiences that match the Senior Site Reliability Engineer role. Highlight your experience with large-scale compute clusters, automation tooling, and any relevant programming languages like Go or Python.

Craft a Compelling Cover Letter:Use your cover letter to tell us why you're passionate about site reliability engineering and how your background aligns with our fast-paced environment. Share specific examples of your problem-solving skills and collaborative projects.

Showcase Your Technical Skills:Don’t shy away from detailing your technical expertise in your application. Mention your experience with container orchestration systems like Kubernetes and any hands-on work with data centre infrastructure to stand out.

Apply Through Our Website:We encourage you to apply directly through our website for the best chance of getting noticed. It’s the quickest way for us to see your application and get the ball rolling on your journey with StudySmarter!

How to prepare for a job interview at Realm

✨Know Your Tech Inside Out

Make sure you brush up on your technical skills, especially around site reliability engineering and high-performance computing. Be ready to discuss your experience with container orchestration systems like Kubernetes and any programming or scripting you've done in Go, Python, or Bash.

✨Showcase Your Problem-Solving Skills

Prepare examples of how you've tackled complex production issues in the past. Highlight your hands-on experience with troubleshooting across hardware, networking, and distributed systems. This will demonstrate your ability to thrive in a fast-paced environment.

✨Emphasise Collaboration

Since this role involves close collaboration with various teams, be ready to talk about your experiences working cross-functionally. Share instances where your clear communication and low-ego approach helped resolve issues or improve processes.

✨Demonstrate Ownership and Accountability

The company values a strong ownership mindset, so come prepared to discuss how you've taken responsibility for projects in the past. Talk about how you ensure quality and execution speed in your work, and how you adapt to changing requirements.

Senior Site Reliability Engineer in London

Realm

Location: London

Senior Site Reliability Engineer in London

At a Glance

Senior Site Reliability Engineer in London employer: Realm

StudySmarter Expert Advice🤫

We think you need these skills to ace Senior Site Reliability Engineer in London

Some tips for your application 🫡

How to prepare for a job interview at Realm

Company

Product

Help