Site Reliability Engineer

Job Board

Companies

Amber Labs

Site Reliability Engineer

Full-Time 60000 - 75000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Join us as a Site Reliability Engineer to enhance data platform reliability and performance.
Company: Amber Labs, a fast-growing digital transformation consultancy in the public sector.
Benefits: Enjoy 25 days annual leave, private medical insurance, and a personal training budget.
Other info: Remote-first work culture with excellent career growth opportunities and team socials.
Why this job: Make a real impact on critical data services while working with cutting-edge technology.
Qualifications: Experience in Site Reliability Engineering and strong collaboration skills required.

The predicted salary is between 60000 - 75000 £ per year.

Amber Labs is a fast-growing digital transformation consultancy delivering complex data and technology solutions across the public sector. We are looking for an experienced Site Reliability Engineer (SRE) to join our team and support a high-profile, security-cleared programme focused on critical data and platform services. This role will focus on improving the reliability, observability and performance of large-scale data platforms and services. You'll work closely with architects, developers, platform engineers and stakeholders to define reliability objectives, drive automation, and ensure services operate effectively at scale.

As an SRE, you will be responsible for embedding reliability engineering principles across a complex data and platform landscape. You will help establish and measure service reliability targets, improve observability, lead root cause investigations, and identify opportunities to automate operational activities. This is an excellent opportunity for someone who enjoys solving complex operational challenges, working across multiple teams, and driving continuous service improvement within a highly secure environment.

Key Responsibilities

Define, implement and champion Site Reliability Engineering practices across critical services and platforms
Collaborate with architects, developers and platform teams to design and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs) and Service Level Agreements (SLAs)
Monitor and manage error budgets, ensuring reliability targets are understood and achieved across engineering teams
Improve service observability through effective monitoring, alerting and reporting capabilities
Conduct and facilitate Root Cause Analysis (RCA) and Post-Incident Reviews (PIRs), driving meaningful improvements and preventative actions
Build and manage an SRE improvement backlog, identifying opportunities for automation and operational excellence
Support the reliability, availability and performance of data platforms, infrastructure and data pipelines
Drive continuous improvement initiatives that reduce operational overhead and improve service resilience
Work with technical and business stakeholders to establish meaningful service health metrics and reporting

Requirements

Strong experience applying Site Reliability Engineering principles within complex production environments
Observability and monitoring practices
Root Cause Analysis (RCA)
Experience designing and implementing reliability frameworks and operational excellence practices
Hands-on experience with: Dynatrace, Kubernetes, Helm
Experience developing automation solutions to improve reliability and reduce manual operational effort
Strong stakeholder management and collaboration skills, with the ability to work effectively across engineering and architecture teams
Experience supporting cloud-native platforms and services
Active SC Clearance that has been used within the last 6–12 months
Ability to operate at SFIA Level 4
Experience working with data platforms, data engineering teams or large-scale data ecosystems
Understanding of data pipelines and their operational challenges
Experience supporting platform engineering or infrastructure teams
Experience working within secure public sector or regulated environments
Familiarity with cloud platforms such as AWS, Azure or GCP

What We Offer

25 days annual leave plus public holidays, giving you time to properly switch off and recharge
Private medical insurance with Bupa
Remote-first working, with access to our Liverpool Street office when you want to collaborate in person
Personal training budget to support your professional development
Perkbox membership, with access to discounts across retail, travel, dining, wellness and entertainment
Electric Vehicle Scheme after one year of service
Regular team socials and opportunities to connect across the business
Employer pension contributions
Referral scheme offering up to £3,000 for successful hires
The opportunity to join a growing consultancy early and genuinely influence its direction and success

Amber Labs is a specialist digital consultancy delivering technology, data and transformation services across complex and highly regulated environments. We work with organisations tackling some of the UK's most challenging digital programmes, helping them deliver reliable, scalable and user-focused solutions.

Site Reliability Engineer employer: Amber Labs

Amber Labs is an exceptional employer, offering a dynamic work culture that prioritises collaboration and innovation within the digital transformation space. With a strong focus on employee growth, we provide a personal training budget, remote-first working options, and regular team socials, ensuring a supportive environment where you can thrive. Join us in shaping the future of technology solutions in the public sector while enjoying competitive benefits like private medical insurance and an electric vehicle scheme.

Contact Details:

Amber Labs Recruitment Team

View Amber Labs profile

We think you need these skills to ace Site Reliability Engineer

Site Reliability Engineering (SRE)

Service Level Indicators (SLIs)

Service Level Objectives (SLOs)

Service Level Agreements (SLAs)

Root Cause Analysis (RCA)

Post-Incident Reviews (PIRs)

Automation Solutions

Observability and Monitoring Practices

Dynatrace

Kubernetes

Helm

Cloud-native Platforms

AWS

Azure

GCP

Site Reliability Engineer

Amber Labs

Apply Now

Site Reliability Engineer

At a Glance

Site Reliability Engineer employer: Amber Labs

We think you need these skills to ace Site Reliability Engineer

Company

Product

Help