Site Reliability Engineering Manager
Site Reliability Engineering Manager

Site Reliability Engineering Manager

Slough Temporary 60000 - 84000 £ / year (est.) Home office (partial)
T

At a Glance

  • Tasks: Lead the design and delivery of scalable, reliable infrastructure and services.
  • Company: Join a top player in the mobile industry, shaping the future of technology.
  • Benefits: Enjoy hybrid remote work, competitive pay, and opportunities for professional growth.
  • Why this job: Be part of a dynamic team driving innovation in cloud-native applications.
  • Qualifications: 5+ years in Site Reliability Engineering with strong Kubernetes and coding skills.
  • Other info: This is a 12-month contract role, outside IR35, offering £300 to £330 per day.

The predicted salary is between 60000 - 84000 £ per year.

Location: Hybrid Remote – London EC2M

Contract: 12 months

Rate: Outside IR35 - £300 to £330 Per Day

About the Role:

We are partnering with one of the top companies in the mobile industry to hire a Site Reliability Engineer (SRE) Manager. In this role, you will collaborate with cross-functional teams to drive the design, development, and delivery of high-performing, scalable, and reliable infrastructure and services. You’ll be responsible for building robust systems, automating operations, and enhancing observability and deployment pipelines for modern cloud-native applications.

Key Responsibilities:

  • System Reliability & Performance: Maintain and scale critical services and infrastructure. Identify performance bottlenecks and work closely with product engineers to optimize applications.
  • Kubernetes Operations: Administer, scale, and troubleshoot clusters in GKE, EKS, or other Kubernetes environments.
  • Infrastructure as Code (IaC): Design and maintain scalable infrastructure using Terraform and automate deployments across public, private, or hybrid clouds (mainly AWS).
  • CI/CD Pipeline Enhancement: Build and improve robust CI/CD pipelines to support fast and safe deployment cycles.
  • Observability & Monitoring: Implement code-based instrumentation and telemetry. Ensure systems are observable with tools for logging, metrics, and alerting.
  • Automation & Scripting: Write tooling and automation scripts in Python, Go, or Rust to reduce toil and manual intervention.
  • Storage & Networking: Manage and optimise storage services like Amazon S3 or Google Cloud Storage (GCS). Resolve complex networking issues in multi-cloud environments.

Essential Requirements:

  • 5+ years of hands-on experience as a Site Reliability Engineer.
  • Proven expertise in Kubernetes (GKE/EKS).
  • Strong proficiency in Python, Go, or Rust.
  • Solid experience with AWS and Infrastructure as Code using Terraform.
  • Deep understanding of Linux internals, standard networking protocols, and distributed systems architecture.
  • Hands-on experience with automation and performance optimisation.
  • Strong knowledge of SRE principles and methodologies.
  • Experience with observability tools and telemetry systems.
  • Exposure to Google Cloud Platform (GCP).
  • Familiarity with hybrid or multi-cloud architecture.
  • Experience with service meshes or edge proxies (e.g., Envoy, Istio).
  • Working knowledge of container security best practices.
T

Contact Detail:

TECEZE Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineering Manager

✨Tip Number 1

Familiarise yourself with the latest trends and technologies in Site Reliability Engineering, especially around Kubernetes and Infrastructure as Code. This will not only help you in interviews but also show your passion for the field.

✨Tip Number 2

Network with professionals in the SRE community through platforms like LinkedIn or relevant tech meetups. Engaging with others can provide insights into the role and may even lead to referrals.

✨Tip Number 3

Prepare to discuss specific projects where you've implemented automation or improved system reliability. Real-world examples can demonstrate your expertise and problem-solving skills effectively.

✨Tip Number 4

Stay updated on observability tools and practices, as this is a key responsibility in the role. Being able to articulate how you've used these tools in past experiences will set you apart from other candidates.

We think you need these skills to ace Site Reliability Engineering Manager

Kubernetes Administration
Terraform
Python Programming
Go Programming
Rust Programming
AWS Proficiency
CI/CD Pipeline Development
Observability Tools Implementation
Automation Scripting
Linux Internals Knowledge
Networking Protocols Understanding
Distributed Systems Architecture
Performance Optimisation
SRE Principles and Methodologies
Multi-Cloud Architecture Familiarity
Container Security Best Practices

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience in Site Reliability Engineering, particularly your hands-on work with Kubernetes, AWS, and Infrastructure as Code. Use specific examples that demonstrate your expertise in these areas.

Craft a Compelling Cover Letter: Write a cover letter that showcases your passion for the role and the mobile industry. Mention how your skills align with the key responsibilities listed in the job description, such as CI/CD pipeline enhancement and automation.

Highlight Relevant Projects: In your application, include details about specific projects where you have successfully implemented observability tools or automated deployments. This will help illustrate your practical experience and problem-solving abilities.

Showcase Soft Skills: Don’t forget to mention your ability to collaborate with cross-functional teams. Highlight any leadership experience you have, as this role requires managing teams and driving initiatives across departments.

How to prepare for a job interview at TECEZE

✨Showcase Your Technical Expertise

Be prepared to discuss your hands-on experience with Kubernetes, AWS, and Infrastructure as Code. Highlight specific projects where you've optimised performance or automated processes, as this will demonstrate your capability to handle the responsibilities of the role.

✨Demonstrate Problem-Solving Skills

Expect technical questions that assess your ability to troubleshoot and resolve issues in complex systems. Prepare examples of past challenges you've faced and how you approached them, particularly in multi-cloud environments.

✨Familiarise Yourself with SRE Principles

Understand the core principles of Site Reliability Engineering and be ready to discuss how you've applied these methodologies in your previous roles. This will show your alignment with the company's focus on reliability and performance.

✨Prepare Questions for the Interviewers

Have insightful questions ready about the company's infrastructure, team dynamics, and future projects. This not only shows your interest in the role but also helps you gauge if the company is the right fit for you.

Site Reliability Engineering Manager
TECEZE
T
  • Site Reliability Engineering Manager

    Slough
    Temporary
    60000 - 84000 £ / year (est.)

    Application deadline: 2027-06-12

  • T

    TECEZE

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>