Site Reliability Engineering Manager
Site Reliability Engineering Manager

Site Reliability Engineering Manager

London Full-Time No home office possible
T

Job Description

Job Title: Site Reliability Engineer – Manager

Location: Hybrid Remote – London EC2M

Contract (12 months)

Rate: Outside IR35 – £300 to £330 Per Day

About the Role:

We are partnering with one of the top companies in the mobile industry to hire a Site Reliability Engineer (SRE) Manager. In this role, you will collaborate with cross-functional teams to drive the design, development, and delivery of high-performing, scalable, and reliable infrastructure and services. You’ll be responsible for building robust systems, automating operations, and enhancing observability and deployment pipelines for modern cloud-native applications.

Key Responsibilities:

  • System Reliability & Performance:
  • Maintain and scale critical services and infrastructure. Identify performance bottlenecks and work closely with product engineers to optimize applications.
  • Kubernetes Operations:
  • Administer, scale, and troubleshoot clusters in GKE, EKS, or other Kubernetes environments.
  • Infrastructure as Code (IaC):
  • Design and maintain scalable infrastructure using Terraform and automate deployments across public, private, or hybrid clouds (mainly AWS).
  • CI/CD Pipeline Enhancement:
  • Build and improve robust CI/CD pipelines to support fast and safe deployment cycles.
  • Observability & Monitoring:
  • Implement code-based instrumentation and telemetry. Ensure systems are observable with tools for logging, metrics, and alerting.
  • Automation & Scripting:
  • Write tooling and automation scripts in Python, Go, or Rust to reduce toil and manual intervention.
  • Storage & Networking:
  • Manage and optimise storage services like Amazon S3 or Google Cloud Storage (GCS). Resolve complex networking issues in multi-cloud environments.

Essential Requirements:

  • 5+ years of hands-on experience as a Site Reliability Engineer.
  • Proven expertise in Kubernetes (GKE/EKS).
  • Strong proficiency in Python, Go, or Rust.
  • Solid experience with AWS and Infrastructure as Code using Terraform.
  • Deep understanding of Linux internals, standard networking protocols, and distributed systems architecture.
  • Hands-on experience with automation and performance optimisation.
  • Strong knowledge of SRE principles and methodologies.
  • Experience with observability tools and telemetry systems.
  • Exposure to Google Cloud Platform (GCP).
  • Familiarity with hybrid or multi-cloud architecture.
  • Experience with service meshes or edge proxies (e.g., Envoy, Istio).
  • Working knowledge of container security best practices.
T

Contact Detail:

TECEZE Recruiting Team

Site Reliability Engineering Manager
TECEZE
T
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>