Site Reliability Engineering Lead - London

Site Reliability Engineering Lead - London

London Full-Time 80000 - 100000 € / year (est.) No home office possible
SGI

At a Glance

  • Tasks: Lead and mentor the SRE team while ensuring system reliability and performance.
  • Company: Dynamic tech company in London focused on innovation and reliability.
  • Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
  • Other info: Join a collaborative environment where you can tackle complex challenges and grow your career.
  • Why this job: Make a real impact by shaping the culture of reliability and driving continuous improvements.
  • Qualifications: Strong software engineering background with experience in incident management and observability tools.

The predicted salary is between 80000 - 100000 € per year.

We’re looking for a true SRE leader with a strong software engineering background. This isn’t a DevOps “on-call only” role — you’ll need to be comfortable reading and writing production code, deeply understanding application behaviour, and working alongside developers as a technical peer. You’ll lead and mentor the SRE team, setting direction and raising the bar for reliability across our systems. You’ll take end-to-end ownership of production, ensuring availability, performance, and effective incident response, while defining SLIs and partnering with Product on meaningful SLOs and error budgets.

In practice, that means you’ll:

  • Own production systems (availability, performance, incident response)
  • Define SLIs/SLOs and use error budgets to guide decisions
  • Run incident management, on-call, and blameless postmortems
  • Get hands-on with code (PHP, Java/.NET) to troubleshoot and improve reliability
  • Drive automation and reduce operational toil
  • Build observability that gives real insight into system health
  • Partner with engineers to embed reliability into the SDLC

A big part of the role is shaping culture — creating a blameless environment, improving how we respond to incidents, and driving continuous, systemic improvements. You’ll also lead on capacity planning, performance optimisation, and cost efficiency as the platform scales. We’re looking for someone who brings strong technical leadership, communicates clearly (especially during incidents), and takes real ownership of problems through to resolution. You should be comfortable operating at scale, have deep experience with SLIs/SLOs, incident management, and observability tooling, and be at home working with Linux, databases, cloud platforms (ideally Azure), Kubernetes, and Infrastructure as Code. Just as importantly, you should enjoy tackling complex, imperfect systems — and turning them into something reliable, scalable, and well-understood.

Site Reliability Engineering Lead - London employer: SGI

Join a forward-thinking company in London that values innovation and technical excellence, where you can lead a talented Site Reliability Engineering team. With a strong emphasis on employee growth, we foster a collaborative work culture that encourages continuous learning and improvement, ensuring you have the resources and support to excel in your role. Enjoy unique benefits such as flexible working arrangements and a commitment to creating a blameless environment that empowers you to take ownership of challenges and drive meaningful change.

SGI

Contact Detail:

SGI Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineering Lead - London

Tip Number 1

Network like a pro! Reach out to your connections in the SRE field and let them know you're on the lookout for opportunities. Attend meetups or tech events where you can chat with industry folks and get the inside scoop on potential openings.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving production code and incident management. This will give potential employers a taste of your technical prowess and problem-solving abilities.

Tip Number 3

Prepare for interviews by brushing up on your knowledge of SLIs, SLOs, and incident response strategies. Be ready to discuss real-life scenarios where you've improved system reliability or handled incidents effectively — this is your chance to shine!

Tip Number 4

Don’t forget to apply through our website! We’re always on the lookout for talented individuals who can lead and mentor teams. Your next big opportunity could be just a click away, so make sure to check out our listings regularly.

We think you need these skills to ace Site Reliability Engineering Lead - London

Software Engineering
Production Code Proficiency
Application Behaviour Understanding
Team Leadership
Incident Management
SLI/SLO Definition
Error Budget Management

Some tips for your application 🫡

Show Your Technical Skills:Make sure to highlight your software engineering background in your application. We want to see your experience with production code and how you've tackled reliability issues in the past.

Be Clear About Your Leadership Style:Since this role involves leading and mentoring, share examples of how you've shaped team culture and improved incident response. We love hearing about your approach to creating a blameless environment!

Demonstrate Your Problem-Solving Skills:We’re looking for someone who takes ownership of problems. In your application, give us a glimpse into how you've resolved complex issues and improved system reliability in previous roles.

Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to keep track of your application and ensure it gets the attention it deserves.

How to prepare for a job interview at SGI

Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, like PHP, Java/.NET, and cloud platforms like Azure. Brush up on your coding skills and be ready to discuss how you've used these technologies to improve system reliability in past roles.

Showcase Your Leadership Skills

Prepare examples of how you've led teams or mentored others in previous positions. Highlight your experience in creating a blameless culture and how you’ve driven continuous improvements in incident management and operational efficiency.

Understand SLIs and SLOs

Be ready to discuss your experience with defining SLIs and SLOs. Think of specific instances where you’ve used error budgets to guide decisions and how that impacted system performance and reliability.

Communicate Clearly Under Pressure

Since this role involves incident management, practice articulating your thought process during high-pressure situations. Prepare to explain how you’ve handled incidents in the past, focusing on your communication style and how you ensured effective responses.