Senior Site Reliability Engineer - UK in London

Senior Site Reliability Engineer - UK in London

London Full-Time 60000 - 80000 £ / year (est.) Home office (partial)
Heidi

At a Glance

  • Tasks: Join us in enhancing healthcare through AI, focusing on system reliability and incident response.
  • Company: Heidi is revolutionising healthcare with a human touch, backed by significant funding.
  • Benefits: Enjoy equity, comprehensive health cover, learning budgets, and flexible work options.
  • Other info: Embrace a culture of diversity, equity, and inclusion while growing your career.
  • Why this job: Make a real impact in healthcare while working with top talent in a dynamic environment.
  • Qualifications: 3-6+ years in SRE or operations-heavy roles, with cloud and Kubernetes experience.

The predicted salary is between 60000 - 80000 £ per year.

Heidi is building an AI Care Partner that supports clinicians every step of the way, from documentation to delivery of care. We exist to double healthcare’s capacity while keeping care deeply human. In 18 months, Heidi has returned more than 18 million hours to clinicians and supported over 73 million patient visits. Today, more than two million patient visits each week are powered by Heidi across 116 countries and over 110 languages. Founded by clinicians, Heidi brings together clinicians, engineers, designers, scientists, creatives, and mathematicians, working with a shared purpose: to strengthen the human connection at the heart of healthcare. Backed by nearly $100 million in total funding, Heidi is expanding across the USA, UK, Canada, and Europe, partnering with major health systems including the NHS, Beth Israel Lahey Health, MaineGeneral, and Monash Health, among others.

This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform. We’re open to candidates who are strong mid-level SREs ready to take on more ownership, as well as senior SREs who enjoy being hands-on in operations. The role is intentionally ops-heavy and focused on keeping real systems healthy in production.

What you’ll do:

  • Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
  • Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
  • Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
  • Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.
  • Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.
  • Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.
  • Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.
  • Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.

What we’re looking for:

  • 3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.
  • Experience supporting production systems and participating in on-call rotations.
  • Comfortable debugging live systems under pressure.
  • Experience operating cloud infrastructure (AWS preferred).
  • Working knowledge of Kubernetes and containerised workloads.
  • Infrastructure as Code experience (Terraform or similar).
  • Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).
  • Scripting or automation experience (Python, Bash, or similar).

Nice to have:

  • Experience leading incidents or mentoring others during on-call.
  • Experience in regulated or security-sensitive environments.
  • Familiarity with databases, queues, and caches in production.
  • Interest in reliability practices such as SLOs, error budgets, and capacity planning.

How We Work:

  • We own production: The Platform/SRE team is responsible for reliability and incident response.
  • Incidents are blameless: We focus on learning and improving systems, not assigning fault.
  • Practical over perfect: We prioritise improvements that reduce real operational pain.
  • Calm under pressure: Clear thinking and communication matter during incidents.

What do we believe in?

  • Heidi builds for the future of healthcare, not just the next quarter, and our goals are ambitious because the world’s health demands it.
  • Live Forever - Every release moves care forward: measured, safe, and built to last. Data guides us, but patients define the truth that matters.
  • Practice Ownership - Decisions follow logic and proof, not hierarchy. Exceptional care demands exceptional standards in our work, our thinking, and our character.
  • Small Cuts Heal Faster - Stability earns trust, speed delivers impact. Progress is about learning fast without breaking what people depend on.
  • Make others better - Feedback is direct, kindness is constant, and excellence lifts everyone. Our success is measured by collective growth, not individual output.

Our mission is clear: expand the world’s capacity to care, and do it without losing the humanity that makes care worth delivering.

Why you should Join Heidi:

  • Real product momentum. We're not trying to generate interest, we're channeling it.
  • Equity from day one. When Heidi wins, you win. You'll share directly in the success you help create.
  • Unmatched impact. Play a pivotal role at a critical growth moment - working on a product that delivers tangible, real-world value to clinicians and patients every day.
  • Work alongside world-class talent. Join a team of operators and builders who've scaled unicorns.
  • Your health, covered. Comprehensive private medical and dental cover through Bupa, plus 24/7 mental health, coaching and wellbeing support through Sonder and a £100/month Healthy Heidi’s stipend.
  • Global parental leave. 26 weeks paid for primary carers and 18 weeks for secondary carers, subject to eligibility.
  • Fertility support. £7,000 one-off payment, eligibility applies.
  • Learning & development. £700 per year for courses, books, memberships, conferences and more.
  • Home office budget. £500 one-off to set up a workspace you actually want to work in.
  • Recharge days after major milestones and busy periods so you can reset and come back strong.
  • Work from anywhere for up to 4 weeks per year, wherever the world takes you.
  • Clinical leave. 10 days per year for eligible clinical roles to maintain accreditation and requirements.
  • Flexibility that works. A hybrid environment, with 3 days in the office.

Heidi’s commitment to Diversity, Equity and Inclusion:

Heidi is dedicated to creating an equitable, inclusive, and supportive work environment that brings people together from diverse backgrounds, experiences, and perspectives. Our strength is in our differences. We're proud to be an equal opportunity employer and are proud to welcome all applicants as we're committed to promoting a culture of opportunity for all.

Senior Site Reliability Engineer - UK in London employer: Heidi

Heidi is an exceptional employer that champions innovation in healthcare while fostering a collaborative and inclusive work culture. With a strong focus on employee growth, offering comprehensive benefits such as equity from day one, extensive learning opportunities, and flexible working arrangements, Heidi empowers its team to make a meaningful impact on the future of care. Located in the UK, employees enjoy a supportive environment that prioritises well-being and professional development, making it an ideal place for those looking to thrive in their careers.

Heidi

Contact Details:

Heidi Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior Site Reliability Engineer - UK in London

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with current Heidi employees on LinkedIn. A personal touch can make all the difference when it comes to landing that interview.

Tip Number 2

Show off your skills! If you’ve got a GitHub or portfolio, make sure it’s up to date. Share projects that highlight your SRE experience, especially those involving Kubernetes or cloud infrastructure. We love seeing what you can do!

Tip Number 3

Prepare for the interview by brushing up on incident response scenarios. Think about how you’d handle real-life situations and be ready to discuss your thought process. We want to see how you think under pressure!

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we’re always looking for passionate individuals who align with our mission to improve healthcare.

We think you need these skills to ace Senior Site Reliability Engineer - UK in London

Incident Response
System Reliability
Kubernetes
Cloud Infrastructure (AWS preferred)
Infrastructure as Code (Terraform or similar)
Monitoring and Alerting Tools (Datadog, Prometheus, etc.)
Scripting or Automation (Python, Bash, or similar)

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter for the Senior Site Reliability Engineer role. Highlight your relevant experience in SRE, DevOps, or operations-heavy engineering roles, and don’t forget to mention any specific tools or technologies you’ve worked with that align with what we’re looking for.

Showcase Your Problem-Solving Skills:We love candidates who can think on their feet! In your application, share examples of how you've tackled production incidents or improved system reliability. This will show us that you’re calm under pressure and ready to take ownership of challenges.

Be Clear and Concise:When writing your application, keep it straightforward. Use clear language and avoid jargon where possible. We appreciate a well-structured application that gets straight to the point, making it easy for us to see why you’d be a great fit for Heidi.

Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it shows you’re keen to join our team at Heidi!

How to prepare for a job interview at Heidi

Know Your Stuff

Make sure you brush up on your SRE fundamentals, especially around incident response and system reliability. Be ready to discuss your experience with Kubernetes, cloud infrastructure, and any monitoring tools you've used. This is your chance to show how your skills align with what Heidi is looking for!

Show Your Problem-Solving Skills

Prepare to share specific examples of how you've tackled production incidents in the past. Think about times when you had to debug live systems under pressure or automate repetitive tasks. Highlighting these experiences will demonstrate your ability to thrive in a fast-paced environment.

Emphasise Collaboration

Heidi values teamwork, so be ready to talk about how you've worked closely with engineers and product teams in previous roles. Discuss how you’ve contributed to improving operational practices and how you handle feedback. This will show that you're not just a lone wolf but a team player who can help strengthen the human connection in healthcare.

Ask Thoughtful Questions

Prepare some insightful questions about Heidi's approach to reliability practices, incident response processes, or their culture of learning from incidents. This shows your genuine interest in the role and helps you gauge if the company aligns with your values and career goals.