At a Glance
- Tasks: Lead the reliability and scalability of cloud infrastructure using Azure/AWS and Terraform.
- Company: Join a leading financial services firm focused on innovation and engineering excellence.
- Benefits: Enjoy a collaborative culture, professional development opportunities, and a chance to make an impact.
- Why this job: Be at the forefront of technology, driving improvements in system reliability and team productivity.
- Qualifications: Proven leadership in technical teams with expertise in Site Reliability Engineering and cloud environments.
- Other info: This is a senior role ideal for those passionate about mentoring and fostering innovation.
The predicted salary is between 72000 - 100000 £ per year.
Job Description
Lead Site Reliability Engineer – Azure/AWS – Terraform – Engineering – London
My financial services client are looking for a Lead Site Reliability engineer who will be responsible for ensuring the reliability, scalability for their infrastructure and services. This is a senior role requiring technical expertise, leadership, and a commitment to continuous improvement. You must have team lead/mentoring experience and be able to balance technical delivery, team productivity, performance measurement, and collaboration across teams and stakeholders.
Duties & Responsibilities:
- Hands-On Engineering & Technical Leadership
- Design, develop, and maintain cloud infrastructure (Azure/AWS) using Terraform and automation.
- Lead troubleshooting, performance optimisation, and incident resolution to enhance reliability.
- Ensure best practices in CI/CD pipelines, observability, and infrastructure deployment.
- Promote Transparency, Inspection, and Adaptation by making both system and team health data accessible and actionable.
- Work with engineering leads, business stakeholders, and the Head of Platform Operations to define and enforce SLAs, SLOs, and engineering standards that support scalability, reliability, and operational efficiency.
- Design solutions with a systems-thinking approach, ensuring infrastructure, observability, and automation strategies support sustainable growth.
- Improve deployment pipelines, automation, and operational workflows across squads, fostering consistency and best practices.
- Support capacity planning, scalability, and security best practices, proactively identifying risks and opportunities to enhance platform resilience.
- Team Productivity, Performance & Agile Ways of Working
Experience Required:
- Proven leadership experience in technical teams, with a focus on mentoring, professional development, and fostering a culture of innovation, reliability, and engineering excellence.
- Proven experience in Site Reliability Engineering, DevOps, or Systems Engineering, with hands-on experience in both Azure and AWS environments.
- Demonstrable expertise in high-performance, scalable, and highly available systems, with experience in optimising reliability, capacity planning, and system performance.
- Deep expertise in DevOps principles, including automation, infrastructure as code (Terraform, Ansible, or Chef), GitOps workflows, CI/CD best practices (GitHub Actions, GitLab CI/CD, Azure DevOps), and collaborative ways of working.
- Strong background in containerisation (Docker) and orchestration (Kubernetes), with a focus on scalability and resilience.
- Hands-on experience with monitoring, observability, and incident management tools (Prometheus, Grafana, ELK, Azure Monitor, Application Insights, Kusto) and a data-driven approach to improving system reliability.
- Strategic mindset, able to align technical initiatives with business goals, drive scalability and performance improvements, and proactively tackle complex challenges.
- Strong understanding of regulatory and security requirements, such as ISO 27001, PCI DSS, CE+ and SOX, with experience implementing compliance-driven engineering practices.
- Advocate for modern DevOps and SRE best practices, championing collaboration, transparency, automation, continuous learning, and continuous improvement across teams.
- Excellent communication skills, able to engage stakeholders, collaborate cross-functionally, and drive alignment on reliability and operational priorities.
Lead Site Reliability Engineer - Azure - Engineering employer: Mentmore Recruitment
Contact Detail:
Mentmore Recruitment Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Lead Site Reliability Engineer - Azure - Engineering
✨Tip Number 1
Make sure to showcase your leadership experience in technical teams during the interview. Highlight specific examples where you mentored team members or fostered a culture of innovation and reliability.
✨Tip Number 2
Prepare to discuss your hands-on experience with Azure and AWS, especially in relation to Terraform and automation. Be ready to explain how you've designed and maintained cloud infrastructure in previous roles.
✨Tip Number 3
Familiarize yourself with the latest DevOps principles and tools mentioned in the job description, such as GitOps workflows and CI/CD best practices. Being able to discuss these in detail will demonstrate your expertise.
✨Tip Number 4
Show your strategic mindset by preparing to align your technical initiatives with business goals. Think of examples where you've tackled complex challenges that improved scalability and performance.
We think you need these skills to ace Lead Site Reliability Engineer - Azure - Engineering
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your leadership experience in technical teams, especially in Site Reliability Engineering and DevOps. Emphasize your hands-on experience with Azure and AWS, as well as your expertise in Terraform and automation.
Craft a Strong Cover Letter: In your cover letter, express your passion for reliability and scalability in cloud infrastructure. Mention specific projects where you led teams to improve performance and incident resolution, showcasing your strategic mindset and ability to align technical initiatives with business goals.
Showcase Relevant Skills: Clearly list your skills related to CI/CD best practices, containerization (Docker), and orchestration (Kubernetes). Highlight your experience with monitoring tools like Prometheus and Grafana, and your understanding of regulatory requirements such as ISO 27001 and PCI DSS.
Prepare for Technical Questions: Be ready to discuss your technical expertise in detail during the interview process. Prepare examples of how you've implemented DevOps principles, improved deployment pipelines, and fostered a culture of innovation and reliability within your teams.
How to prepare for a job interview at Mentmore Recruitment
✨Showcase Your Technical Expertise
Be prepared to discuss your hands-on experience with Azure and AWS, as well as your proficiency in Terraform and automation. Highlight specific projects where you've successfully implemented these technologies to enhance reliability and scalability.
✨Demonstrate Leadership Skills
Since this is a senior role, emphasize your leadership experience. Share examples of how you've mentored team members, fostered a culture of innovation, and improved team productivity through effective collaboration and communication.
✨Discuss Your Approach to Problem-Solving
Prepare to talk about your strategies for troubleshooting and incident resolution. Provide examples of how you've optimized performance and enhanced system reliability, focusing on your data-driven approach to decision-making.
✨Align Technical Initiatives with Business Goals
Articulate how your technical decisions have supported broader business objectives. Discuss your experience in defining SLAs and SLOs, and how you've ensured that engineering standards align with operational efficiency and scalability.