Senior Site Reliability Engineer in London

Senior Site Reliability Engineer in London

London Full-Time 60000 - 80000 £ / year (est.) Home office (partial)
Dormont Manufacturing Co

At a Glance

  • Tasks: Join our SRE team to ensure system reliability and respond to production incidents.
  • Company: Heidi, a fast-growing tech company focused on impactful healthcare solutions.
  • Benefits: Equity from day one, personal development budget, and wellness days.
  • Other info: Hybrid work environment with a focus on diversity and inclusion.
  • Why this job: Make a real impact in healthcare while working with world-class talent.
  • Qualifications: 3-6+ years in SRE or operations-heavy roles, with cloud and Kubernetes experience.

The predicted salary is between 60000 - 80000 £ per year.

This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform. We’re open to candidates who are strong mid-level SREs ready to take on more ownership, as well as senior SREs who enjoy being hands-on in operations. The role is intentionally ops-heavy and focused on keeping real systems healthy in production.

What you’ll do

  • Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
  • Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
  • Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
  • Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.
  • Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.
  • Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.
  • Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.
  • Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.

What we’re looking for

  • 3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.
  • Experience supporting production systems and participating in on-call rotations.
  • Comfortable debugging live systems under pressure.
  • Experience operating cloud infrastructure (AWS preferred).
  • Working knowledge of Kubernetes and containerised workloads.
  • Infrastructure as Code experience (Terraform or similar).
  • Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).
  • Scripting or automation experience (Python, Bash, or similar).

Nice to have

  • Experience leading incidents or mentoring others during on-call.
  • Experience in regulated or security-sensitive environments.
  • Familiarity with databases, queues, and caches in production.
  • Interest in reliability practices such as SLOs, error budgets, and capacity planning.

Why you should join Heidi

  • Real product momentum. We’re not trying to generate interest, we’re channeling it.
  • Equity from day one. When Heidi wins, you win. You’ll share directly in the success you help create.
  • Unmatched impact. Play a pivotal role in defining and scaling customer success at a critical growth moment - all while working on a product that delivers tangible value to clinicians and patients every day.
  • Work alongside world-class talent. Join a team of operators and builders who’ve scaled unicorns.
  • Global reach. Help shape our international expansion as we bring Heidi to key international markets.
  • Growth and balance. Enjoy a personal development budget, work from anywhere for a month, dedicated wellness days, and your birthday off to recharge.
  • Flexibility that works. A hybrid environment, with 3 days in the office.

Heidi’s commitment to Diversity, Equity and Inclusion

Heidi is dedicated to creating an equitable, inclusive, and supportive work environment that brings people together from diverse backgrounds, experiences, and perspectives. Our strength is in our differences. We’re proud to be an equal opportunity employer and are proud to welcome all applicants as we’re committed to promoting a culture of opportunity for all.

Senior Site Reliability Engineer in London employer: Dormont Manufacturing Co

Heidi is an exceptional employer that offers a dynamic work environment where you can make a real impact on production systems while collaborating with world-class talent. With a strong focus on employee growth, you will benefit from a personal development budget, flexible working arrangements, and a commitment to diversity and inclusion. Join us to not only advance your career but also to share in the success of a product that delivers tangible value to clinicians and patients every day.

Dormont Manufacturing Co

Contact Details:

Dormont Manufacturing Co Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior Site Reliability Engineer in London

Tip Number 1

Get your hands dirty! When you're applying for a Senior Site Reliability Engineer role, make sure to showcase your practical experience. Talk about the incidents you've managed and how you improved system reliability. We want to see that you can handle real-world challenges.

Tip Number 2

Network like a pro! Connect with current SREs or engineers in similar roles on LinkedIn. Ask them about their experiences at companies like Heidi. This not only gives you insights but also shows your genuine interest in the field. Plus, referrals can give you a leg up!

Tip Number 3

Be ready to talk tech! Brush up on your knowledge of Kubernetes, cloud infrastructure, and monitoring tools. During interviews, we love to hear about your hands-on experience and how you've tackled operational challenges. Show us your passion for keeping systems healthy!

Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're serious about joining our team. Don’t forget to highlight your unique skills and experiences that align with our needs!

We think you need these skills to ace Senior Site Reliability Engineer in London

Incident Response
On-call Support
System Reliability
Kubernetes
Cloud Infrastructure (AWS preferred)
Infrastructure as Code (Terraform or similar)
Monitoring and Alerting Tools (Datadog, Prometheus, etc)

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter for the Senior Site Reliability Engineer role. Highlight your experience with production systems, incident response, and any relevant tools like Kubernetes or AWS. We want to see how your skills align with what we’re looking for!

Show Your Passion for Reliability:In your application, express your enthusiasm for operational reliability and improving systems. Share examples of how you've tackled reliability issues in the past or how you’ve contributed to incident response. This will help us see your commitment to keeping systems healthy.

Be Clear and Concise:When writing your application, keep it straightforward and to the point. Use bullet points where possible to make your achievements stand out. We appreciate clarity, especially when it comes to your experience and skills related to SRE practices.

Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it helps us keep everything organised on our end.

How to prepare for a job interview at Dormont Manufacturing Co

Know Your Stuff

Make sure you brush up on your SRE fundamentals, especially around incident response and system reliability. Be ready to discuss your experience with Kubernetes, cloud infrastructure, and any monitoring tools you've used. The more specific examples you can provide, the better!

Show Your Problem-Solving Skills

Prepare to talk about how you've tackled production incidents in the past. Think of a few scenarios where you identified issues, implemented fixes, or improved processes. This will demonstrate your hands-on experience and ability to think on your feet.

Emphasise Collaboration

Since this role involves working closely with engineers and product teams, be ready to share examples of how you've collaborated in previous roles. Highlight any experiences where you contributed to improving production readiness or service ownership.

Ask Insightful Questions

Prepare some thoughtful questions about the company's operational practices, team dynamics, or their approach to reliability. This shows your genuine interest in the role and helps you gauge if it's the right fit for you.