Job Board

Companies

Realm

Senior Site Reliability Engineer - DevOps (Remote)

Senior Site Reliability Engineer - DevOps (Remote) in City of London

City of London Full-Time 70000 - 90000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Ensure reliability and performance of a global compute platform while resolving complex production issues.
Company: Join a high-growth infrastructure company at the forefront of machine learning workloads.
Benefits: Enjoy competitive salary, equity, health coverage, and generous paid time off.
Other info: Collaborate closely with teams to design and operate high-demand computational systems.
Why this job: Make a real impact in a fast-paced environment with cutting-edge technology.
Qualifications: 5+ years in site reliability engineering or DevOps, with strong systems expertise.

The predicted salary is between 70000 - 90000 £ per year.

High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. Fast-paced environment with emphasis on ownership, execution speed, and quality.

Position operating across software, infrastructure, and operations to ensure reliability, scalability, and performance of a globally distributed compute platform. Close collaboration with networking, platform engineering, and physical infrastructure teams to design and operate systems supporting high-demand computational workloads.

Hands-on engineering role requiring strong systems expertise, with responsibility for resolving complex production issues, improving system resilience, and enhancing platform observability.

Deployment and management of large-scale compute clusters using automation tooling, with adaptation to customer requirements.
Validation and optimisation of compute, storage, and networking systems in coordination with internal teams and vendors.
Execution of large-scale data migrations between cloud and on-premise environments with focus on efficiency and cost.
Troubleshooting across the full stack, including hardware, networking, and distributed systems.
Development of internal tooling and automation to improve deployment speed, reliability, and operational efficiency.

Experience building maintainable, well-documented systems in complex environments.

5+ years of experience in site reliability engineering, DevOps, systems administration, or high-performance computing.
Strong written and verbal communication skills in English.
Programming or scripting experience in Go, Python, or Bash.
Strong technical foundation in computing or related discipline.
Experience operating large-scale machine learning or AI-compute workloads.
Hands-on experience with data centre or bare-metal infrastructure.
Knowledge of high-performance networking technologies.
Experience managing large-scale storage systems (commercial or open-source).

Compensation & Benefits:

Competitive salary and equity package.
Retirement or pension contributions aligned with local standards.
Health coverage including medical, dental, and vision.
Generous paid time off policy.

Senior Site Reliability Engineer - DevOps (Remote) in City of London employer: Realm

Join a high-growth infrastructure company that prioritises innovation and collaboration, offering a dynamic remote work environment for Senior Site Reliability Engineers. With a strong focus on employee development, competitive compensation, and comprehensive health benefits, this role provides the opportunity to work on cutting-edge technology while enjoying a culture that values ownership and quality. Experience the unique advantage of contributing to large-scale machine learning workloads, all from the comfort of your own home.

Contact Detail:

Realm Recruiting Team

View Realm Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Senior Site Reliability Engineer - DevOps (Remote) in City of London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with potential colleagues on LinkedIn. We all know that sometimes it’s not just what you know, but who you know that can land you that dream job.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to site reliability engineering or DevOps. We want to see your hands-on experience and how you tackle real-world problems.

✨Tip Number 3

Prepare for the interview like it’s a high-stakes game! Research the company, understand their tech stack, and be ready to discuss how your experience aligns with their needs. We’re looking for candidates who can hit the ground running!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive about their job search!

We think you need these skills to ace Senior Site Reliability Engineer - DevOps (Remote) in City of London

Site Reliability Engineering

DevOps

Systems Administration

High-Performance Computing

Automation Tooling

Data Migration

Troubleshooting

Programming in Go

Python

Bash

Networking Technologies

Large-Scale Storage Systems

System Resilience

Platform Observability

Communication Skills

Some tips for your application 🫡

Tailor Your CV: Make sure your CV reflects the skills and experiences that match the job description. Highlight your expertise in site reliability engineering, DevOps, and any relevant programming languages like Go or Python. We want to see how you can contribute to our fast-paced environment!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about high-performance computing and how your background aligns with our mission. Be sure to mention any hands-on experience you've had with large-scale compute clusters or data migrations.

Showcase Your Problem-Solving Skills: In your application, don’t shy away from sharing examples of complex production issues you've resolved or how you've improved system resilience. We love seeing candidates who can demonstrate ownership and execution speed in their previous roles!

Apply Through Our Website: We encourage you to apply directly through our website for the best chance of getting noticed. It’s super easy, and you’ll be able to keep track of your application status. Plus, we’re excited to see what you bring to the table!

How to prepare for a job interview at Realm

✨Know Your Tech Inside Out

Make sure you brush up on your technical skills, especially in areas like systems administration, high-performance computing, and the programming languages mentioned in the job description. Be ready to discuss your hands-on experience with large-scale compute clusters and any complex production issues you've resolved.

✨Showcase Your Problem-Solving Skills

Prepare examples of how you've tackled challenging problems in previous roles. Think about specific instances where you improved system resilience or enhanced platform observability. This will demonstrate your ability to think critically and act decisively in a fast-paced environment.

✨Collaboration is Key

Since this role involves close collaboration with various teams, be prepared to discuss your experience working with networking, platform engineering, and physical infrastructure teams. Highlight any successful projects where teamwork played a crucial role in achieving results.

✨Ask Insightful Questions

At the end of the interview, don’t forget to ask questions that show your interest in the company and the role. Inquire about their current challenges with large-scale data migrations or how they ensure the reliability of their globally distributed compute platform. This shows you're not just interested in the job, but also in contributing to their success.

Senior Site Reliability Engineer - DevOps (Remote) in City of London

Realm

Location: City of London

Apply now

Senior Site Reliability Engineer - DevOps (Remote) in City of London

At a Glance

Senior Site Reliability Engineer - DevOps (Remote) in City of London employer: Realm

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Senior Site Reliability Engineer - DevOps (Remote) in City of London

Some tips for your application 🫡

How to prepare for a job interview at Realm

Senior Site Reliability Engineer - DevOps (Remote) in City of London

Land your dream job quicker with Premium