At a Glance
- Tasks: Lead and mentor the SRE team while ensuring system reliability and performance.
- Company: Dynamic tech company in London focused on innovation and reliability.
- Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
- Other info: Join a culture of continuous improvement and collaboration in a fast-paced environment.
- Why this job: Shape the future of our systems and make a real impact on reliability.
- Qualifications: Strong software engineering background and experience with SLIs/SLOs and incident management.
The predicted salary is between 80000 - 100000 ÂŁ per year.
We’re looking for a true SRE leader with a strong software engineering background. This isn’t a DevOps “on-call only” role — you’ll need to be comfortable reading and writing production code, deeply understanding application behaviour, and working alongside developers as a technical peer. You’ll lead and mentor the SRE team, setting direction and raising the bar for reliability across our systems. You’ll take end-to-end ownership of production, ensuring availability, performance, and effective incident response, while defining SLIs and partnering with Product on meaningful SLOs and error budgets.
In practice, that means you’ll:
- Own production systems (availability, performance, incident response)
- Define SLIs/SLOs and use error budgets to guide decisions
- Run incident management, on-call, and blameless postmortems
- Get hands-on with code (PHP, Java/.NET) to troubleshoot and improve reliability
- Drive automation and reduce operational toil
- Build observability that gives real insight into system health
- Partner with engineers to embed reliability into the SDLC
A big part of the role is shaping culture — creating a blameless environment, improving how we respond to incidents, and driving continuous, systemic improvements. You’ll also lead on capacity planning, performance optimisation, and cost efficiency as the platform scales.
We’re looking for someone who brings strong technical leadership, communicates clearly (especially during incidents), and takes real ownership of problems through to resolution. You should be comfortable operating at scale, have deep experience with SLIs/SLOs, incident management, and observability tooling, and be at home working with Linux, databases, cloud platforms (ideally Azure), Kubernetes, and Infrastructure as Code. Just as importantly, you should enjoy tackling complex, imperfect systems — and turning them into something reliable, scalable, and well-understood.
Site Reliability Engineering Lead - London | London, UK employer: SGI
Contact Detail:
SGI Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineering Lead - London | London, UK
✨Tip Number 1
Network like a pro! Reach out to your connections in the SRE field and let them know you're on the lookout for opportunities. Attend meetups or tech events in London to meet potential employers and fellow SREs. You never know who might have the inside scoop on a job opening!
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving incident management, SLIs/SLOs, and automation. This will give potential employers a taste of your hands-on experience and problem-solving abilities.
✨Tip Number 3
Prepare for technical interviews by brushing up on your coding skills and understanding of production systems. Practice common SRE scenarios, like incident response and performance optimisation, so you can demonstrate your expertise during interviews. We believe in you!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets noticed. Plus, we love seeing candidates who are proactive about their job search. So, get your application in and let’s make some reliability magic happen together!
We think you need these skills to ace Site Reliability Engineering Lead - London | London, UK
Some tips for your application 🫡
Show Your Technical Skills: Make sure to highlight your software engineering background in your application. We want to see your experience with production code and how you've tackled reliability issues in the past.
Be Clear About Your Leadership Style: Since this role involves leading and mentoring, share examples of how you've shaped team culture and improved incident response. We love seeing candidates who can communicate effectively and foster a blameless environment.
Demonstrate Your Problem-Solving Abilities: We’re looking for someone who takes ownership of problems. In your application, include specific instances where you’ve resolved complex issues or improved system performance. Show us how you think!
Apply Through Our Website: Don’t forget to submit your application through our website! It’s the best way for us to keep track of your application and ensure it gets the attention it deserves.
How to prepare for a job interview at SGI
✨Know Your Tech Inside Out
Make sure you’re well-versed in the technologies mentioned in the job description, like PHP, Java/.NET, and cloud platforms like Azure. Brush up on your coding skills and be ready to discuss how you've used these technologies to improve system reliability in past roles.
✨Showcase Your Leadership Skills
Prepare examples of how you've led teams or projects in the past. Highlight your experience in mentoring others and driving a blameless culture during incidents. This role is about shaping the SRE team, so demonstrate your ability to inspire and guide others.
✨Understand SLIs and SLOs
Be ready to discuss your experience with defining and using SLIs and SLOs. Think of specific instances where you’ve used error budgets to make decisions or improve performance. This shows you understand the metrics that matter in an SRE role.
✨Prepare for Incident Management Scenarios
Expect questions around incident management and how you handle on-call situations. Prepare to share your approach to running blameless postmortems and how you’ve improved incident response times in previous roles. This will show your readiness to take ownership of production systems.