Site Reliability Engineering Lead - London

Job Board

Companies

SGI

Site Reliability Engineering Lead - London

London Full-Time 80000 - 100000 € / year (est.) No home office possible

Apply Now

At a Glance

Tasks: Lead and mentor the SRE team while ensuring system reliability and performance.
Company: Dynamic tech company in London focused on innovation and reliability.
Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
Other info: Join a collaborative environment where you can tackle complex challenges and grow your career.
Why this job: Make a real impact by shaping the culture of reliability and driving continuous improvements.
Qualifications: Strong software engineering background with experience in incident management and observability tools.

The predicted salary is between 80000 - 100000 € per year.

We’re looking for a true SRE leader with a strong software engineering background. This isn’t a DevOps “on-call only” role — you’ll need to be comfortable reading and writing production code, deeply understanding application behaviour, and working alongside developers as a technical peer. You’ll lead and mentor the SRE team, setting direction and raising the bar for reliability across our systems. You’ll take end-to-end ownership of production, ensuring availability, performance, and effective incident response, while defining SLIs and partnering with Product on meaningful SLOs and error budgets.

In practice, that means you’ll:

Own production systems (availability, performance, incident response)
Define SLIs/SLOs and use error budgets to guide decisions
Run incident management, on-call, and blameless postmortems
Get hands-on with code (PHP, Java/.NET) to troubleshoot and improve reliability
Drive automation and reduce operational toil
Build observability that gives real insight into system health
Partner with engineers to embed reliability into the SDLC

A big part of the role is shaping culture — creating a blameless environment, improving how we respond to incidents, and driving continuous, systemic improvements. You’ll also lead on capacity planning, performance optimisation, and cost efficiency as the platform scales. We’re looking for someone who brings strong technical leadership, communicates clearly (especially during incidents), and takes real ownership of problems through to resolution. You should be comfortable operating at scale, have deep experience with SLIs/SLOs, incident management, and observability tooling, and be at home working with Linux, databases, cloud platforms (ideally Azure), Kubernetes, and Infrastructure as Code. Just as importantly, you should enjoy tackling complex, imperfect systems — and turning them into something reliable, scalable, and well-understood.

Site Reliability Engineering Lead - London employer: SGI

Join a forward-thinking company in London that values innovation and technical excellence, where you can lead a talented Site Reliability Engineering team. With a strong emphasis on employee growth, we foster a collaborative work culture that encourages continuous learning and improvement, ensuring you have the resources and support to excel in your role. Enjoy unique benefits such as flexible working arrangements and a commitment to creating a blameless environment that empowers you to take ownership of challenges and drive meaningful change.

Contact Detail:

SGI Recruiting Team

View SGI Profile

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineering Lead - London

✨Tip Number 1

Network like a pro! Reach out to your connections in the SRE field and let them know you're on the lookout for opportunities. Attend meetups or tech events where you can chat with industry folks and get the inside scoop on potential openings.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving production code and incident management. This will give potential employers a taste of your technical prowess and problem-solving abilities.

✨Tip Number 3

Prepare for interviews by brushing up on your knowledge of SLIs, SLOs, and incident response strategies. Be ready to discuss real-life scenarios where you've improved system reliability or handled incidents effectively — this is your chance to shine!

✨Tip Number 4

Don’t forget to apply through our website! We’re always on the lookout for talented individuals who can lead and mentor teams. Your next big opportunity could be just a click away, so make sure to check out our listings regularly.

We think you need these skills to ace Site Reliability Engineering Lead - London

Software Engineering

Production Code Proficiency

Application Behaviour Understanding

Team Leadership

Incident Management

SLI/SLO Definition

Error Budget Management

Hands-on Coding (PHP, Java/.NET)

Automation

Observability

Capacity Planning

Performance Optimisation

Cost Efficiency

Linux

Cloud Platforms (Azure)

Kubernetes

Infrastructure as Code

Some tips for your application 🫡

Show Your Technical Skills:Make sure to highlight your software engineering background in your application. We want to see your experience with production code and how you've tackled reliability issues in the past.

Be Clear About Your Leadership Style:Since this role involves leading and mentoring, share examples of how you've shaped team culture and improved incident response. We love hearing about your approach to creating a blameless environment!

Demonstrate Your Problem-Solving Skills:We’re looking for someone who takes ownership of problems. In your application, give us a glimpse into how you've resolved complex issues and improved system reliability in previous roles.

Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to keep track of your application and ensure it gets the attention it deserves.

How to prepare for a job interview at SGI

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, like PHP, Java/.NET, and cloud platforms like Azure. Brush up on your coding skills and be ready to discuss how you've used these technologies to improve system reliability in past roles.

✨Showcase Your Leadership Skills

Prepare examples of how you've led teams or mentored others in previous positions. Highlight your experience in creating a blameless culture and how you’ve driven continuous improvements in incident management and operational efficiency.

✨Understand SLIs and SLOs

Be ready to discuss your experience with defining SLIs and SLOs. Think of specific instances where you’ve used error budgets to guide decisions and how that impacted system performance and reliability.

✨Communicate Clearly Under Pressure

Since this role involves incident management, practice articulating your thought process during high-pressure situations. Prepare to explain how you’ve handled incidents in the past, focusing on your communication style and how you ensured effective responses.

Site Reliability Engineering Lead - London

SGI

Location: London

Apply Now

Site Reliability Engineering Lead - London

At a Glance

Site Reliability Engineering Lead - London employer: SGI

StudySmarter Expert Advice🤫

We think you need these skills to ace Site Reliability Engineering Lead - London

Some tips for your application 🫡

How to prepare for a job interview at SGI

Company

Product

Help