Job Board

Companies

Lyrebird Health

Staff Site Reliability Engineer

Full-Time 48000 - 72000 £ / year (est.) Home office (partial)

At a Glance

Tasks: Design and evolve systems for reliability, scalability, and security at Lyrebird Health.
Company: Lyrebird Health is revolutionising healthcare by automating clinicians' tasks.
Benefits: High ownership role with direct impact on product reliability and meaningful problem-solving.
Why this job: Shape the future of healthcare technology while ensuring fast, secure, and reliable services.
Qualifications: 8+ years in engineering, strong SRE experience, and leadership in cross-team initiatives.
Other info: Diverse team culture encouraging applicants from underrepresented backgrounds.

The predicted salary is between 48000 - 72000 £ per year.

We’re looking for a Staff Site Reliability Engineer (SRE) to raise the reliability, scalability, and security bar across the Lyrebird platform. This is a senior, high-impact role focused on designing and evolving the systems and practices that keep Lyrebird fast, safe, and available. You’ll work across infrastructure, application reliability, observability, incident response, and platform enablement - partnering closely with Engineering, Security, and Product. This is not a “keep the lights on” role. You’ll drive meaningful improvements to how we build, deploy, and operate our services in production - with real autonomy and ownership.

About Lyrebird Health: Lyrebird Health is transforming the quality and accessibility of healthcare by automating clinicians’ most time-consuming tasks. Thousands of clinicians across many disciplines already use Lyrebird — and that number is growing every day. They trust us to deliver a fast, reliable, and secure experience. We value that trust above all else and strive to earn it while continuing to amaze our users.

What You’ll Do

Reliability & Production Engineering
- Own reliability outcomes across core services and customer-facing systems
- Define, implement, and evolve SLOs/SLIs, alerting strategy, and error budgets
- Lead initiatives to improve uptime, latency, and overall system resilience
- Proactively identify reliability risks and drive mitigation plans to completion
Observability & Incident Response
- Improve end-to-end observability (metrics, logs, traces) so issues are detected early and diagnosed quickly
- Lead incident response for high-severity events and guide teams through calm, effective mitigation
- Drive post-incident reviews that result in measurable, lasting improvements
- Build a culture of operational excellence: fewer incidents, faster recovery, better learning
Platform Enablement
- Develop internal tooling and paved paths that make “doing the right thing” the easiest option
- Improve the developer experience around deployments, rollbacks, environment consistency, and service ownership
- Partner with engineers to uplift production-readiness across new and existing services
Infrastructure & Automation
- Improve infrastructure reliability and maintainability using Infrastructure as Code
- Strengthen deployment workflows and reduce operational toil through automation
- Help shape architecture decisions with a reliability and scalability lens
Security & Compliance Support
- Embed security and compliance principles into platform practices (access controls, auditability, safe-by-default designs)
- Work closely with Security and Engineering leadership to support regulatory and enterprise requirements without slowing down delivery

What We’re Looking For

8+ years of engineering experience, with significant depth in SRE / platform/production systems
Strong experience operating and improving systems in production (including incident response)
Proven ability to lead cross-team initiatives and influence engineering standards

Technical Strength

You don’t need to tick every box, but you should be strong across most:
Cloud/Infrastructure, AWS (ECS, EC2, VPC, IAM, RDS/Aurora, S3, CloudWatch)
Infrastructure as Code (Terraform)
Observability
Strong grasp of monitoring and alerting principles
Experience with logs + metrics + tracing and building meaningful dashboards
Familiar with OpenTelemetry and modern observability tooling

Systems & Operational Excellence

Knowledge of reliability patterns: graceful degradation, retries, backoff, timeouts, load shedding, capacity planning
Strong debugging instincts across distributed systems
Practical approach to risk management and tradeoffs

Software Engineering

Ability to build tools and automation (TypeScript, Go, Python, or similar)
Familiarity with CI/CD and safe rollout strategies (feature flags, canary, blue/green)

Bonus Skill (Nice to Have)

Experience supporting security frameworks (SOC 2, ISO 27001, HIPAA-style environments)
Experience with service mesh patterns, multi-account AWS environments, or multi-region design
Experience working with healthcare or regulated domains
Experience scaling engineering org practices as the company grows

Who You Are

You’re deeply accountable - you take ownership of outcomes, not just tasks
You value simplicity and reliability over cleverness
You’re calm and effective in incidents, and you raise the quality bar afterward
You communicate clearly across engineering and non-engineering stakeholders
You’re pragmatic: you know when to move fast, and when to slow down to reduce risk

Why This Role Is Different

Staff-level scope with real influence across engineering
Direct impact on reliability for a product clinicians depend on every day
Work on meaningful problems where security, performance, and trust matter
High ownership environment with room to shape how the company operates at scale

At Lyrebird, you won’t just respond to incidents - you’ll design the systems and standards that prevent them. We’re building a team that reflects the diversity of the people who’ll benefit from our work. If you’re from an underrepresented background in tech, we especially encourage you to apply - even if you don’t meet every single requirement.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Staff Site Reliability Engineer employer: Lyrebird Health

At Lyrebird Health, we pride ourselves on being an exceptional employer that fosters a culture of innovation and accountability. Our team enjoys a collaborative work environment where autonomy is encouraged, and meaningful contributions are recognised, particularly in our mission to enhance healthcare accessibility. With ample opportunities for professional growth and a commitment to diversity, we empower our employees to shape the future of healthcare technology while enjoying a supportive and dynamic workplace.

Contact Detail:

Lyrebird Health Recruiting Team

View Lyrebird Health Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Staff Site Reliability Engineer

✨Tip Number 1

Network like a pro! Reach out to folks in your industry on LinkedIn or at meetups. A friendly chat can lead to opportunities that aren’t even advertised yet.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing your projects and contributions. This gives potential employers a taste of what you can do beyond your CV.

✨Tip Number 3

Prepare for interviews by practising common SRE scenarios. Think about how you’d handle incidents or improve system reliability. We want to see your thought process!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive!

We think you need these skills to ace Staff Site Reliability Engineer

Site Reliability Engineering (SRE)

Production Systems Management

Incident Response

Service Level Objectives (SLOs)

Service Level Indicators (SLIs)

Observability

Infrastructure as Code (Terraform)

Cloud Infrastructure (AWS)

Monitoring and Alerting Principles

Debugging Distributed Systems

Automation (TypeScript, Go, Python)

CI/CD Practices

Security and Compliance Principles

Risk Management

Cross-Team Collaboration

Some tips for your application 🫡

Show Your Passion for Reliability: When writing your application, let us see your enthusiasm for reliability and system performance. Share specific examples of how you've improved uptime or resilience in past roles – we love to hear about your hands-on experience!

Tailor Your Application: Make sure to customise your application to highlight the skills and experiences that align with our job description. We want to see how your background in SRE or production systems makes you a perfect fit for our team at Lyrebird.

Be Clear and Concise: Keep your application straightforward and to the point. Use clear language to describe your achievements and avoid jargon unless it’s relevant. We appreciate clarity as much as you do!

Apply Through Our Website: Don’t forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. We can’t wait to see what you bring to the table!

How to prepare for a job interview at Lyrebird Health

✨Know Your Stuff

Make sure you brush up on your technical skills, especially around SRE principles and the tools mentioned in the job description. Be ready to discuss your experience with AWS, Infrastructure as Code, and observability tools. This is your chance to show how your background aligns with what they need!

✨Showcase Your Problem-Solving Skills

Prepare to share specific examples of how you've tackled reliability issues or improved system performance in the past. Use the STAR method (Situation, Task, Action, Result) to structure your answers. They want to see your thought process and how you handle challenges.

✨Communicate Clearly

Since this role involves working closely with various teams, practice explaining complex technical concepts in simple terms. Think about how you would communicate with non-engineering stakeholders. Clear communication can set you apart from other candidates.

✨Demonstrate Ownership and Accountability

Be prepared to discuss times when you took ownership of a project or incident. Highlight how you ensured successful outcomes and learned from any mistakes. This role values accountability, so showing that you take responsibility for your work will resonate well with them.

Staff Site Reliability Engineer

Lyrebird Health

Staff Site Reliability Engineer

Full-Time

48000 - 72000 £ / year (est.)
Lyrebird Health

50-100

View Lyrebird Health Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now

Staff Site Reliability Engineer

At a Glance

Staff Site Reliability Engineer employer: Lyrebird Health

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Staff Site Reliability Engineer

Some tips for your application 🫡

How to prepare for a job interview at Lyrebird Health

Staff Site Reliability Engineer

Land your dream job quicker with Premium

Similar positions in other companies

UK’s top job board for Gen Z