Site Reliability Engineer (SRE)
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Slough Full-Time No home office possible
L

Site Reliability Engineer (Observability)

London- Hybrid/ 3 Days

Contract Inside IR35- 6 Months initially

We’re looking for a Site Reliability Engineer (SRE) to join our client to build and maintain observability systems and to ensure their core services remain reliable, scalable, and high-performing.

Responsibilities:

  • Deploy and manage observability tools using a Prometheus like metrics store and Grafana Enterprise.
  • Automate monitoring, alerting, and incident response.
  • Build Grafana dashboards for system insights.
  • Apply Infrastructure as Code (IaC) principles.
  • Develop tooling in Golang (preferred) or Python.
  • Advocate for SRE principles like SLOs, SLIs, and error budgets.
  • Integrate monitoring with incident management workflows.

Requirements:

  • SRE principles and reliability engineering expertise.
  • Solid familiarity with Linux
  • Strong experience in deploying and building containers using Podman or Docker
  • Golang (preferred) or Python for automation and API integration.
  • Experience with Grafana, VictoriaMetrics, and PromQL
  • Experience with centralized logs solutions deployment and management
  • Strong Infrastructure as Code (IaC) knowledge.

Nice to Have:

  • OpenTelemetry experience.
  • Terraform, Ansible, or CI/CD knowledge.
  • Background in datacentre and compute hardware services.
  • AWS infrastructure configuration and deployment
  • Familiarity with Kubernetes and cloud-native systems.
  • Incident response automation expertise.
L

Contact Detail:

Levy Global Recruiting Team

Site Reliability Engineer (SRE)
Levy Global
L
  • Site Reliability Engineer (SRE)

    Slough
    Full-Time

    Application deadline: 2027-04-21

  • L

    Levy Global

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>