At a Glance
- Tasks: Ensure the reliability and performance of critical production platforms while leading SRE practices.
- Company: Join an inclusive team committed to innovation and professional development.
- Benefits: Flexible shifts, competitive salary, and opportunities for growth in a dynamic environment.
- Other info: Lead complex troubleshooting and drive continuous improvement in a fast-paced setting.
- Why this job: Make a real impact on high-availability systems and collaborate with talented engineers.
- Qualifications: Experience in AWS, Kubernetes, and incident management is essential.
The predicted salary is between 60000 - 80000 £ per year.
Join us as a Senior Site Reliability Engineer. In this key role, you’ll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services. You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way. This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development. You’ll need to have the flexibility to support the team by working shifts and weekends on rotation.
What you'll do:
- Act as a hands-on expert responsible for ensuring the reliability, availability, and performance of critical production platforms.
- Lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes.
- Take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from.
- Design and operate highly resilient AWS-based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks.
- Lead incident management, escalation, and 24/7 on-call practices, including post-incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams.
- Implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self-healing, auto-scaling, and failure recovery mechanisms using tools such as Karpenter.
- Build secure and scalable networking and service communication such as Cilium.
- Define and operate observability platforms using Grafana, Prometheus, Loki, and Tempo.
- Partner with DevOps and engineering teams to ensure production readiness and operational excellence.
- Lead complex troubleshooting across distributed systems and cloud-native environments.
- Develop reusable “golden paths,” operational runbooks, and reliability patterns.
- Ensure platforms meet regulatory, security, and operational risk requirements.
- Use data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements.
The skills you'll need:
- A strong background in operating large-scale, business-critical platforms and a passion for reliability engineering.
- Deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on-call leadership.
- Advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction.
- Proficiency in Terraform, GitOps, and cloud automation practices.
- Hands-on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD.
- A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium.
- Experience scaling infrastructure using Karpenter and auto-scaling strategies.
- Expertise in observability tooling, including Grafana, Prometheus, Loki, and Tempo.
- A proven ability to troubleshoot and resolve complex, cross-system production issues.
- Experience operating in regulated or high-security environments.
- Strong leadership, mentoring, and stakeholder engagement capabilities.
- The ability to balance reliability, risk, and delivery in a fast-paced environment.
Senior Site Reliability Engineer in Edinburgh employer: NatWest Group
As a Senior Site Reliability Engineer, you will thrive in an inclusive and collaborative work environment that prioritises innovation and professional growth. Our commitment to employee development is matched by our focus on work-life balance, offering flexible shift patterns and opportunities for continuous learning. Join us to be part of a dynamic team where your expertise in AWS and Kubernetes will directly contribute to the reliability and performance of critical production platforms.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Site Reliability Engineer in Edinburgh
✨Join the IT Consultancy Buzz
Get involved in local or virtual IT consultancy meetups and forums. This is where we can rub shoulders with industry professionals, get insights into what NatWest Group values, and even spot unadvertised opportunities. Don't miss out on these chances to make a name for ourselves in the IT world!
✨Show Off Your Skills
Create a personal project or case study relevant to the challenges NatWest Group might face. Use platforms like GitHub or Medium to share your findings. This not only demonstrates our consulting skills but shows a proactive attitude, making us stand out from the crowd when applying for that full-time gig.
✨Leverage LinkedIn for Connections
Follow and engage with the relevant thought leaders and influencers in IT consultancy on LinkedIn. Share insightful content and join discussions to gain visibility. A well-placed comment or shared article could catch the attention of someone at NatWest Group!
✨Direct Apply to NatWest Group
Let's not forget to apply directly through the NatWest Group website! Tailor your application to showcase our understanding of their consulting style and how we can contribute to their projects. A personalised approach can make a huge difference in landing that full-time position!
We think you need these skills to ace Senior Site Reliability Engineer in Edinburgh
Some tips for your application 🫡
Showcase Your Problem-Solving Skills:In IT consulting, it's all about problem-solving, so make sure your CV highlights your analytical skills and any relevant projects you've tackled. Mention specific technologies or methodologies you've used to resolve issues or improve processes; this shows you can think critically and deliver results, which is vital for us at NatWest Group.
Highlight Relevant Certifications:Certifications like ITIL, PMP, or even specific tech stack qualifications can really make you stand out. Make sure to include these in your CV, as they not only demonstrate your expertise but also your commitment to staying current in the field. We love seeing candidates who are proactive about their professional development!
Tailor Your Cover Letter:Your cover letter is your chance to connect personally with us at NatWest Group. Share stories about your experiences in IT consulting, and how they shaped your desire to join our team. Mention why you’re excited about this particular role, and how you see yourself contributing to our projects.
Keep It Clear and Concise:We're all busy, so make sure your application is easy to read. Use bullet points for key achievements, and don’t overload us with jargon. A clean, professional layout goes a long way. Remember, the clearer your application, the more likely we are to invite you in for an interview!
How to prepare for a job interview at NatWest Group
✨Brush Up on Your Technical Skills
For an IT consulting role, be ready to demonstrate your technical prowess. You might face questions on systems integration, cloud technologies, or even troubleshooting specific software. If you have experience with tools like AWS, Azure, or even specific programming languages, make sure you can talk about them fluently.
✨Showcase Your Problem-Solving Approach
IT consulting is all about solving problems for clients. Think about how you can illustrate your approach to a past challenge using the STAR method (Situation, Task, Action, Result). It's a great way to show how you tackle complex issues and come up with effective solutions.
✨Know the Business Impact of IT Solutions
When discussing your experiences, focus not just on the tech solutions you implemented, but also on their business impact. Employers want to see that you can connect IT with organisational goals. Prep examples that highlight how your tech contributions improved efficiency or reduced costs for past clients or projects.
✨Prepare for Behavioural Questions
Since IT consulting often involves teamwork and client interactions, expect behavioural questions that assess your interpersonal skills. Be prepared with examples that demonstrate your adaptability, communication skills, and how you handle client feedback. Before the interview, think of situations where you worked closely with clients to create effective IT strategies or changes.