Senior Site Reliability Engineer in Manchester

Senior Site Reliability Engineer in Manchester

Manchester Full-Time 60000 - 80000 £ / year (est.) Working from home possible
NatWest Group

At a Glance

  • Tasks: Ensure the reliability and performance of critical production platforms while leading SRE practices.
  • Company: Join an inclusive tech team committed to innovation and professional growth.
  • Benefits: Flexible remote work, competitive salary, and opportunities for continuous learning.
  • Other info: Dynamic role with 24/7 support and excellent career advancement opportunities.
  • Why this job: Make a real impact on high-availability systems and drive operational excellence.
  • Qualifications: Experience with AWS, Kubernetes, and strong incident management skills required.

The predicted salary is between 60000 - 80000 £ per year.

Join us as a Senior Site Reliability Engineer. In this key role, you’ll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services. You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way. This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development. You’ll need to have the flexibility to support the team by working shifts and weekends on rotation.

As a Senior Site Reliability Engineer, you’ll act as a hands-on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You’ll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You’ll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from.

We’ll expect you as well to design and operate highly resilient AWS-based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You’ll lead incident management, escalation, and 24/7 on-call practices, including post-incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you’ll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self-healing, auto-scaling, and failure recovery mechanisms using tools such as Karpenter.

In addition to this, you’ll be:

  • Building secure and scalable networking and service communication such as Cilium
  • Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo
  • Partnering with DevOps and engineering teams to ensure production readiness and operational excellence
  • Leading complex troubleshooting across distributed systems and cloud-native environments
  • Developing reusable “golden paths,” operational runbooks, and reliability patterns
  • Ensuring platforms meet regulatory, security, and operational risk requirements
  • Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements

The skills you'll need:

We’re looking for a highly experienced Site Reliability Engineer with a strong background in operating large-scale, business-critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on-call leadership.

Moreover, you’ll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands-on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential.

In addition, you’ll have to bring:

  • A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium
  • Experience scaling infrastructure using Karpenter and auto-scaling strategies
  • Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo
  • A proven ability to troubleshoot and resolve complex, cross-system production issues
  • Experience operating in regulated or high-security environments
  • Strong leadership, mentoring, and stakeholder engagement capabilities
  • The ability to balance reliability, risk, and delivery in a fast-paced environment

Hours: 35

Job Posting Closing Date: 03/06/2026

Ways of Working: Remote First

Senior Site Reliability Engineer in Manchester employer: NatWest Group

As a Senior Site Reliability Engineer, you will thrive in a dynamic and inclusive work environment that prioritises innovation and professional growth. Our remote-first culture allows for flexibility while ensuring you have the support of a collaborative team dedicated to operational excellence and continuous improvement. With opportunities for leadership and hands-on expertise in cutting-edge technologies like AWS and Kubernetes, this role offers a meaningful career path in a company that values your contributions and fosters a commitment to reliability and security.

NatWest Group

Contact Details:

NatWest Group Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior Site Reliability Engineer in Manchester

Tip Number 1

Network like a pro! Reach out to your connections in the industry, attend meetups, and engage in online forums. You never know who might have the inside scoop on job openings or can put in a good word for you.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects and contributions to open-source. This is a great way to demonstrate your expertise in AWS, Kubernetes, and SRE practices to potential employers.

Tip Number 3

Prepare for interviews by practising common technical questions and scenarios related to Site Reliability Engineering. Use mock interviews with friends or online platforms to get comfortable discussing your experience and problem-solving approach.

Tip Number 4

Don’t forget to apply through our website! We love seeing candidates who are genuinely interested in joining our team. Tailor your application to highlight your relevant experience and how you can contribute to our collaborative ethos.

We think you need these skills to ace Senior Site Reliability Engineer in Manchester

Site Reliability Engineering (SRE)
AWS
Kubernetes (EKS)
Incident Management
24/7 Support Models
Terraform
GitOps

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the Senior Site Reliability Engineer role. Highlight your experience with AWS, Kubernetes, and SRE principles. We want to see how your skills align with what we’re looking for!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Share your passion for reliability engineering and how you’ve tackled challenges in previous roles. Let us know why you’re excited about joining our team at StudySmarter.

Showcase Your Technical Skills:Don’t forget to highlight your technical expertise! Mention your experience with Terraform, GitOps, and observability tools like Grafana and Prometheus. We love seeing candidates who can demonstrate their hands-on skills.

Apply Through Our Website:We encourage you to apply through our website for a smoother application process. It helps us keep track of your application and ensures you don’t miss any important updates from us!

How to prepare for a job interview at NatWest Group

Know Your SRE Principles

Make sure you brush up on your knowledge of Site Reliability Engineering principles like SLIs, SLOs, and error budgets. Be ready to discuss how you've applied these concepts in past roles, as this will show your depth of understanding and practical experience.

Showcase Your Technical Skills

Prepare to demonstrate your expertise in AWS, Kubernetes, Terraform, and GitOps. Bring examples of projects where you've implemented these technologies, especially focusing on how you’ve improved reliability and performance in production systems.

Prepare for Scenario-Based Questions

Expect scenario-based questions that test your problem-solving skills in real-time incidents. Think about past incidents you've managed, how you approached them, and what you learned. This will highlight your hands-on experience and ability to lead under pressure.

Engage with Stakeholders

Since the role involves significant stakeholder interaction, be prepared to discuss how you've collaborated with different teams in the past. Share examples of how you’ve communicated technical concepts to non-technical stakeholders, showcasing your ability to bridge gaps and foster collaboration.