Job Board

Companies

Levy Global

Site Reliability Engineer (SRE)

Slough Full-Time No home office possible

Site Reliability Engineer (Observability)

London- Hybrid/ 3 Days

Contract Inside IR35- 6 Months initially

We’re looking for a Site Reliability Engineer (SRE) to join our client to build and maintain observability systems and to ensure their core services remain reliable, scalable, and high-performing.

Responsibilities:

Deploy and manage observability tools using a Prometheus like metrics store and Grafana Enterprise.
Automate monitoring, alerting, and incident response.
Build Grafana dashboards for system insights.
Apply Infrastructure as Code (IaC) principles.
Develop tooling in Golang (preferred) or Python.
Advocate for SRE principles like SLOs, SLIs, and error budgets.
Integrate monitoring with incident management workflows.

Requirements:

SRE principles and reliability engineering expertise.
Solid familiarity with Linux
Strong experience in deploying and building containers using Podman or Docker
Golang (preferred) or Python for automation and API integration.
Experience with Grafana, VictoriaMetrics, and PromQL
Experience with centralized logs solutions deployment and management
Strong Infrastructure as Code (IaC) knowledge.

Nice to Have:

OpenTelemetry experience.
Terraform, Ansible, or CI/CD knowledge.
Background in datacentre and compute hardware services.
AWS infrastructure configuration and deployment
Familiarity with Kubernetes and cloud-native systems.
Incident response automation expertise.

Contact Detail:

Levy Global Recruiting Team

View Levy Global Profile

Site Reliability Engineer (SRE)

Slough

Full-Time

Application deadline: 2027-04-21
Levy Global

View Levy Global Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now