At a Glance
- Tasks: Lead the reliability and scalability of cloud infrastructure while mentoring a tech-savvy team.
- Company: Join a dynamic financial services client focused on innovation and engineering excellence.
- Benefits: Enjoy competitive pay, opportunities for professional growth, and a collaborative work environment.
- Why this job: Be at the forefront of technology, driving impactful solutions in a supportive and innovative culture.
- Qualifications: Proven leadership in SRE/DevOps with hands-on experience in Azure/AWS and a passion for mentoring.
- Other info: This is a full-time role ideal for those looking to make a significant impact in tech.
The predicted salary is between 54000 - 84000 £ per year.
This range is provided by Mentmore. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base Pay Range
My financial services client is looking for a Lead Site Reliability Engineer who will be responsible for ensuring the reliability and scalability of their infrastructure and services. This is a senior role requiring technical expertise, leadership, and a commitment to continuous improvement. You must have team lead/mentoring experience and be able to balance technical delivery, team productivity, performance measurement, and collaboration across teams and stakeholders.
Duties & Responsibilities:
- Hands-On Engineering & Technical Leadership
- Design, develop, and maintain cloud infrastructure (Azure/AWS) using Terraform and automation.
- Lead troubleshooting, performance optimisation, and incident resolution to enhance reliability.
- Ensure best practices in CI/CD pipelines, observability, and infrastructure deployment.
- Promote Transparency, Inspection, and Adaptation by making both system and team health data accessible and actionable.
- Work with engineering leads and business stakeholders to define and enforce SLAs, SLOs, and engineering standards that support scalability, reliability, and operational efficiency.
- Design solutions with a systems-thinking approach, ensuring infrastructure, observability, and automation strategies support sustainable growth.
- Improve deployment pipelines, automation, and operational workflows across squads, fostering consistency and best practices.
- Support capacity planning, scalability, and security best practices, proactively identifying risks and opportunities to enhance platform resilience.
Experience Required:
- Proven leadership experience in technical teams, with a focus on mentoring, professional development, and fostering a culture of innovation, reliability, and engineering excellence.
- Proven experience in Site Reliability Engineering, DevOps, or Systems Engineering, with hands-on experience in both Azure and AWS environments.
- Demonstrable expertise in high-performance, scalable, and highly available systems, with experience in optimising reliability, capacity planning, and system performance.
- Deep expertise in DevOps principles, including automation, infrastructure as code (Terraform, Ansible, or Chef), GitOps workflows, CI/CD best practices (GitHub Actions, GitLab CI/CD, Azure DevOps), and collaborative ways of working.
- Strong background in containerisation (Docker) and orchestration (Kubernetes), with a focus on scalability and resilience.
- Hands-on experience with monitoring, observability, and incident management tools (Prometheus, Grafana, ELK, Azure Monitor, Application Insights, Kusto) and a data-driven approach to improving system reliability.
- Strategic mindset, able to align technical initiatives with business goals, drive scalability and performance improvements, and proactively tackle complex challenges.
- Strong understanding of regulatory and security requirements, such as ISO 27001, PCI DSS, CE+ and SOX, with experience implementing compliance-driven engineering practices.
- Advocate for modern DevOps and SRE best practices, championing collaboration, transparency, automation, continuous learning, and continuous improvement across teams.
- Excellent communication skills, able to engage stakeholders, collaborate cross-functionally, and drive alignment on reliability and operational priorities.
Seniority Level
Mid-Senior level
Employment Type
Full-time
Job Function
Information Technology
Industries
Insurance
#J-18808-Ljbffr
Lead Site Reliability Engineer employer: Mentmore
Contact Detail:
Mentmore Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Lead Site Reliability Engineer
✨Tip Number 1
Make sure to showcase your leadership experience in technical teams during the interview. Be prepared to discuss specific examples of how you've mentored team members and fostered a culture of innovation and reliability.
✨Tip Number 2
Highlight your hands-on experience with cloud infrastructure, particularly in Azure and AWS. Be ready to talk about projects where you designed and maintained infrastructure using Terraform and automation tools.
✨Tip Number 3
Demonstrate your understanding of DevOps principles and best practices. Prepare to discuss how you've implemented CI/CD pipelines and improved operational workflows in previous roles.
✨Tip Number 4
Be proactive in discussing your experience with monitoring and observability tools. Share how you've used data-driven approaches to enhance system reliability and tackle complex challenges.
We think you need these skills to ace Lead Site Reliability Engineer
Some tips for your application 🫡
Tailor Your Resume: Make sure your resume highlights your leadership experience in technical teams, especially in Site Reliability Engineering and DevOps. Emphasize your hands-on experience with Azure and AWS, as well as your expertise in automation and infrastructure as code.
Craft a Compelling Cover Letter: In your cover letter, discuss your strategic mindset and how you align technical initiatives with business goals. Mention specific examples of how you've improved system reliability and performance in previous roles.
Showcase Relevant Skills: Clearly list your skills related to CI/CD best practices, containerization, and monitoring tools. Highlight your experience with Terraform, Docker, Kubernetes, and any incident management tools you've used.
Prepare for Technical Questions: Be ready to discuss your approach to troubleshooting, performance optimization, and incident resolution. Prepare examples that demonstrate your ability to foster a culture of innovation and reliability within your team.
How to prepare for a job interview at Mentmore
✨Showcase Your Technical Expertise
Be prepared to discuss your hands-on experience with cloud infrastructure, particularly in Azure and AWS. Highlight specific projects where you utilized Terraform for automation and how you approached troubleshooting and performance optimization.
✨Demonstrate Leadership Skills
Since this role requires mentoring and leading technical teams, share examples of how you've fostered a culture of innovation and reliability. Discuss your approach to team productivity and how you've successfully balanced technical delivery with team dynamics.
✨Align with Business Goals
Prepare to explain how your technical initiatives have aligned with business objectives in the past. Discuss your experience in defining SLAs and SLOs, and how you've driven scalability and operational efficiency through strategic thinking.
✨Emphasize Continuous Improvement
Talk about your commitment to continuous learning and improvement in your previous roles. Share specific examples of how you've implemented best practices in CI/CD pipelines and incident management, and how you've used data to enhance system reliability.