Senior Site Reliability Engineer in Cambridge

Job Board

Companies

SoCode Recruitment

Senior Site Reliability Engineer

Senior Site Reliability Engineer in Cambridge

Cambridge Full-Time Home office (partial)

Apply Now

At a Glance

Tasks: Enhance reliability and observability of a cloud-native SaaS platform.
Company: Join a dynamic team focused on operational excellence in tech.
Benefits: Competitive salary, flexible work options, and growth opportunities.
Other info: Perfect for those passionate about improving system reliability in a fast-paced environment.
Why this job: Tackle real-world challenges and make a significant impact in tech.
Qualifications: Experience with Kubernetes, AWS, and strong automation skills required.

We’re hiring a Senior Site Reliability Engineer to help strengthen the reliability, observability and operational maturity of a cloud‑native SaaS platform operating within a regulated environment. This is a hands‑on role focused on production systems, monitoring, incident response, automation and operational excellence across a Kubernetes‑based AWS platform. You’ll work closely with Platform Engineering and Application teams to improve system health, reduce operational risk and build scalable reliability practices as the business continues to grow.

Key responsibilities

Building and improving observability across metrics, logs and traces
Developing actionable dashboards, alerts, runbooks and operational tooling
Supporting production systems, incident response and root cause analysis
Improving reliability, resilience, deployment feedback loops and operational readiness
Identifying operational inefficiencies and automating repetitive toil
Driving post‑incident reviews and long‑term corrective improvements
Helping define SLOs, SLIs and reliability standards across customer‑critical services

Tech environment includes

AWS
Kubernetes / EKS
Observability
Prometheus
Grafana
OpenTelemetry
GitOps
Argo CD
CI/CD
Cloud Operations

We’re looking for someone with

Strong experience supporting Kubernetes‑based production environments
Practical AWS and cloud‑native infrastructure knowledge
Experience with observability, monitoring and incident management
Strong scripting or automation capability (Python, Go, Bash, TypeScript etc.)
Calm, pragmatic thinking during live operational incidents
Passion for improving reliability and reducing operational noise
Experience within SaaS, fintech or regulated environments would be highly beneficial

This is an excellent opportunity for an engineer who enjoys solving real production challenges, improving operational resilience and building mature SRE practices within a scaling engineering organisation.

Senior Site Reliability Engineer in Cambridge employer: SoCode Recruitment

Join a forward-thinking company that prioritises innovation and operational excellence, offering a dynamic work culture where collaboration and continuous learning are at the forefront. As a Senior Site Reliability Engineer, you'll benefit from a supportive environment that encourages professional growth, with access to cutting-edge technologies and the opportunity to make a tangible impact on a cloud-native SaaS platform. Located in a vibrant tech hub, this role provides unique advantages such as networking opportunities and a strong community of like-minded professionals dedicated to enhancing system reliability and performance.

Contact Details:

SoCode Recruitment Recruitment Team

View SoCode Recruitment profile

StudySmarter Expert Advice🤫

We think this is how you could land Senior Site Reliability Engineer in Cambridge

✨Tip Number 1

Network like a pro! Reach out to current employees on LinkedIn or join relevant tech communities. A friendly chat can give you insider info and maybe even a referral!

✨Tip Number 2

Show off your skills! Prepare a mini-project or a case study that highlights your experience with Kubernetes, AWS, and observability tools. This can really set you apart during interviews.

✨Tip Number 3

Be ready for hands-on challenges! Brush up on your incident response strategies and automation skills. We love seeing candidates who can think on their feet and tackle real-world problems.

✨Tip Number 4

Apply through our website! It’s the best way to ensure your application gets noticed. Plus, it shows you’re genuinely interested in joining our team at StudySmarter.

We think you need these skills to ace Senior Site Reliability Engineer in Cambridge

Kubernetes

AWS

Observability

Monitoring

Incident Management

Scripting

Automation

Python

Bash

TypeScript

Production Systems Support

Root Cause Analysis

Operational Readiness

SLOs and SLIs Definition

Some tips for your application 🫡

Tailor Your CV:Make sure your CV highlights your experience with Kubernetes and AWS, as these are key for the role. We want to see how your skills align with our needs, so don’t be shy about showcasing your relevant projects!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you’re passionate about Site Reliability Engineering and how you can contribute to our cloud-native SaaS platform. Let us know what excites you about the role!

Showcase Your Problem-Solving Skills:In your application, share specific examples of how you've tackled production challenges or improved system reliability in the past. We love hearing about your hands-on experience and how you’ve made a difference!

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity. Plus, it’s super easy!

How to prepare for a job interview at SoCode Recruitment

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, especially Kubernetes and AWS. Brush up on your knowledge of observability tools like Prometheus and Grafana, as well as scripting languages like Python or Go. Being able to discuss these confidently will show that you’re ready for the hands-on nature of the role.

✨Prepare Real-World Examples

Think of specific instances where you've improved system reliability or handled incident responses. Be ready to share stories about how you’ve automated processes or built dashboards. This not only demonstrates your experience but also shows your problem-solving skills in action.

✨Understand the Company’s Challenges

Research the company’s SaaS platform and any known challenges they face in a regulated environment. This will help you tailor your answers to show how you can contribute to their operational excellence and reliability practices. It’s all about showing that you’re not just a fit for the role, but also for the company’s mission.

✨Ask Insightful Questions

Prepare thoughtful questions about their current SRE practices, incident management processes, or how they define SLOs and SLIs. This shows your genuine interest in the role and helps you gauge if the company aligns with your career goals. Plus, it opens up a dialogue that can make you more memorable.

Senior Site Reliability Engineer in Cambridge

SoCode Recruitment

Location: Cambridge

Apply Now

Senior Site Reliability Engineer in Cambridge

At a Glance

Senior Site Reliability Engineer in Cambridge employer: SoCode Recruitment

StudySmarter Expert Advice🤫

We think you need these skills to ace Senior Site Reliability Engineer in Cambridge

Some tips for your application 🫡

How to prepare for a job interview at SoCode Recruitment

Company

Product

Help