Site Reliability Engineering Manager (London)
Site Reliability Engineering Manager (London)

Site Reliability Engineering Manager (London)

London Temporary 60000 - 84000 £ / year (est.) Home office (partial)
T

At a Glance

  • Tasks: Lead the design and delivery of scalable, reliable infrastructure and services.
  • Company: Join a top player in the mobile industry, shaping the future of technology.
  • Benefits: Enjoy hybrid remote work, competitive pay, and opportunities for professional growth.
  • Why this job: Be part of a dynamic team, driving innovation and making a real impact in tech.
  • Qualifications: 5+ years in Site Reliability Engineering with strong skills in Kubernetes, Python, and AWS.
  • Other info: This is a 12-month contract role, offering flexibility and a chance to work on cutting-edge projects.

The predicted salary is between 60000 - 84000 £ per year.

Location: Hybrid Remote – London EC2M

Contract: 12 months

Rate: Outside IR35 - £300 to £330 Per Day

About the Role:

We are partnering with one of the top companies in the mobile industry to hire a Site Reliability Engineer (SRE) Manager. In this role, you will collaborate with cross-functional teams to drive the design, development, and delivery of high-performing, scalable, and reliable infrastructure and services. You’ll be responsible for building robust systems, automating operations, and enhancing observability and deployment pipelines for modern cloud-native applications.

Key Responsibilities:

  • System Reliability & Performance: Maintain and scale critical services and infrastructure. Identify performance bottlenecks and work closely with product engineers to optimize applications.
  • Kubernetes Operations: Administer, scale, and troubleshoot clusters in GKE, EKS, or other Kubernetes environments.
  • Infrastructure as Code (IaC): Design and maintain scalable infrastructure using Terraform and automate deployments across public, private, or hybrid clouds (mainly AWS).
  • CI/CD Pipeline Enhancement: Build and improve robust CI/CD pipelines to support fast and safe deployment cycles.
  • Observability & Monitoring: Implement code-based instrumentation and telemetry. Ensure systems are observable with tools for logging, metrics, and alerting.
  • Automation & Scripting: Write tooling and automation scripts in Python, Go, or Rust to reduce toil and manual intervention.
  • Storage & Networking: Manage and optimise storage services like Amazon S3 or Google Cloud Storage (GCS). Resolve complex networking issues in multi-cloud environments.

Essential Requirements:

  • 5+ years of hands-on experience as a Site Reliability Engineer.
  • Proven expertise in Kubernetes (GKE/EKS).
  • Strong proficiency in Python, Go, or Rust.
  • Solid experience with AWS and Infrastructure as Code using Terraform.
  • Deep understanding of Linux internals, standard networking protocols, and distributed systems architecture.
  • Hands-on experience with automation and performance optimisation.
  • Strong knowledge of SRE principles and methodologies.
  • Experience with observability tools and telemetry systems.
  • Exposure to Google Cloud Platform (GCP).
  • Familiarity with hybrid or multi-cloud architecture.
  • Experience with service meshes or edge proxies (e.g., Envoy, Istio).
  • Working knowledge of container security best practices.

Site Reliability Engineering Manager (London) employer: TECEZE

Join a leading player in the mobile industry as a Site Reliability Engineering Manager in London, where innovation meets collaboration. Our hybrid work culture promotes flexibility and work-life balance, while our commitment to employee growth ensures you have access to continuous learning opportunities and career advancement. With competitive rates and a focus on cutting-edge technology, this role offers a unique chance to make a significant impact in a dynamic environment.
T

Contact Detail:

TECEZE Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineering Manager (London)

✨Tip Number 1

Familiarise yourself with the latest trends and technologies in Site Reliability Engineering, especially around Kubernetes and Infrastructure as Code. This will not only help you during interviews but also show your passion for the field.

✨Tip Number 2

Network with professionals in the SRE community, particularly those who work with cloud-native applications. Engaging in discussions on platforms like LinkedIn or attending relevant meetups can provide insights and potentially lead to referrals.

✨Tip Number 3

Prepare to discuss specific projects where you've implemented automation or improved system reliability. Be ready to share metrics that demonstrate your impact, as this will highlight your hands-on experience and problem-solving skills.

✨Tip Number 4

Showcase your knowledge of observability tools and telemetry systems by discussing how you've used them in past roles. This will demonstrate your understanding of maintaining high-performing and reliable infrastructure, which is crucial for this position.

We think you need these skills to ace Site Reliability Engineering Manager (London)

Kubernetes Administration
Terraform
Python Programming
Go Programming
Rust Programming
AWS Services
CI/CD Pipeline Development
Observability Tools
Telemetry Systems
Linux Internals
Networking Protocols
Distributed Systems Architecture
Automation Scripting
Performance Optimisation
Service Meshes
Container Security Best Practices
Multi-Cloud Architecture

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience in Site Reliability Engineering, especially your hands-on work with Kubernetes, AWS, and Infrastructure as Code. Use specific examples that demonstrate your expertise in these areas.

Craft a Compelling Cover Letter: In your cover letter, explain why you are passionate about the role and how your skills align with the company's needs. Mention your experience with CI/CD pipelines and automation, as well as your understanding of SRE principles.

Showcase Relevant Projects: If you have worked on projects involving cloud-native applications or automation scripts, be sure to include these in your application. Highlight any achievements related to performance optimisation and system reliability.

Proofread Your Application: Before submitting, carefully proofread your application for any spelling or grammatical errors. A polished application reflects your attention to detail, which is crucial for a Site Reliability Engineer.

How to prepare for a job interview at TECEZE

✨Showcase Your Technical Expertise

Be prepared to discuss your hands-on experience with Kubernetes, AWS, and Infrastructure as Code. Highlight specific projects where you've optimised performance or automated processes, as this will demonstrate your capability to handle the responsibilities of the role.

✨Demonstrate Problem-Solving Skills

Expect technical questions that assess your ability to troubleshoot and resolve issues in real-time. Prepare examples of past challenges you've faced in system reliability and how you approached solving them, particularly in multi-cloud environments.

✨Emphasise Collaboration and Communication

As an SRE Manager, you'll need to work closely with cross-functional teams. Be ready to discuss how you've successfully collaborated with product engineers and other stakeholders to enhance system performance and reliability.

✨Familiarise Yourself with Observability Tools

Since observability is a key aspect of the role, brush up on the tools and methodologies you've used for logging, metrics, and alerting. Be prepared to explain how you've implemented these in previous roles to ensure systems are observable and maintainable.

Site Reliability Engineering Manager (London)
TECEZE
T
  • Site Reliability Engineering Manager (London)

    London
    Temporary
    60000 - 84000 £ / year (est.)

    Application deadline: 2027-06-12

  • T

    TECEZE

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>