Senior Site Reliability Engineering Manager, Production Engineering
Senior Site Reliability Engineering Manager, Production Engineering

Senior Site Reliability Engineering Manager, Production Engineering

London Full-Time 54000 - 84000 £ / year (est.) Home office (partial)
Go Premium
T

At a Glance

  • Tasks: Lead a team to enhance platform reliability, performance, and security in cloud systems.
  • Company: Cisco ThousandEyes delivers flawless digital experiences across networks using AI and telemetry data.
  • Benefits: Enjoy a hybrid work model, career development opportunities, and a culture of continuous learning.
  • Why this job: Join a dynamic team focused on innovation and operational excellence in a fast-paced environment.
  • Qualifications: Experience in leading SRE teams, deep knowledge of Kubernetes, and strong communication skills required.
  • Other info: Diverse backgrounds are encouraged; apply even if you don't meet every qualification.

The predicted salary is between 54000 - 84000 £ per year.

Please note that we have a hybrid approach to work and would like to find someone who can come into the office in London at least one day a week.

Who We Are

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, Internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end-user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

About The Role

As the Senior Engineering Manager for our Production Engineering SRE team, you will lead a group of skilled engineers responsible for the design and management of large-scale, highly available distributed systems in the cloud, collaborating directly with application development teams to enhance the reliability, performance, and security of our platform. You will focus on enhancing the reliability, performance, and security of our platform while collaborating with cross-functional teams to drive operational excellence.

What You’ll Do

  • Team Leadership and Development: Build and mentor a high-performing team of Site Reliability Engineers that embed with application development teams. Foster a culture of continuous learning, innovation, and best practices. Manage performance, set goals, and provide career development opportunities.
  • Strategic Planning and Execution: Develop and implement strategies to improve platform reliability, security, and performance. Collaborate with other engineering leaders to align SRE initiatives with overall business objectives. Establish and execute on a roadmap to build common platform solutions to reliability, security, and scale challenges engineering teams at ThousandEyes face.
  • Operational Excellence: Oversee the design and implementation of scalable operations tooling for SREs and Developers. Ensure the effective management of our 24x7 incident response and on-call rotation. Lead efforts to automate production operations and adopt robust monitoring solutions.
  • Security and Compliance: Partner with application development teams and other platform engineering teams to enhance the security posture of our containerized and cloud-native systems. Ensure compliance with Cisco and industry standards for data protection, scanning, and system security.
  • Cross-functional Collaboration: Work closely with software development teams to optimize architecture and services for availability and performance. Collaborate with product management to align SRE initiatives with product roadmaps. Represent the Production Engineering SRE team in cross-functional meetings and initiatives.

Minimum Qualifications

  • Proven track record of leading and scaling SRE teams in a fast-paced environment.
  • Deep knowledge of site reliability principles, including incident response, change management, and SLOs.
  • Expert-level knowledge of Kubernetes and its ecosystem.
  • Strong understanding of cloud platforms, preferably AWS.
  • Experience with microservices architecture and distributed systems.

Preferred Qualifications

  • Strong communication and leadership skills, with the ability to influence cross-function stakeholders.
  • Demonstrated ability in SRE, DevOps, or related fields, with at least 3 years in a management role.
  • Background in security engineering, DevSecOps or a strong understanding of security best practices in cloud-native environments.
  • Familiarity with CNCF tools such as Prometheus, OpenTelemetry, and ArgoCD.

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That’s why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you’re interested in this work.

Senior Site Reliability Engineering Manager, Production Engineering employer: ThousandEyes

At Cisco ThousandEyes, we pride ourselves on fostering a dynamic and inclusive work culture that prioritises employee growth and innovation. As a Senior Site Reliability Engineering Manager in London, you will lead a talented team while enjoying the benefits of a hybrid work model, continuous learning opportunities, and a commitment to operational excellence. Our collaborative environment empowers you to make a meaningful impact on digital experiences, all while being part of a diverse team that values unique perspectives.
T

Contact Detail:

ThousandEyes Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior Site Reliability Engineering Manager, Production Engineering

✨Tip Number 1

Familiarise yourself with the latest trends in site reliability engineering, especially around Kubernetes and cloud platforms like AWS. This knowledge will not only help you during interviews but also demonstrate your commitment to staying current in a fast-paced environment.

✨Tip Number 2

Network with current or former employees of Cisco ThousandEyes on platforms like LinkedIn. Engaging with them can provide valuable insights into the company culture and expectations, which can be beneficial when discussing your fit for the role.

✨Tip Number 3

Prepare to discuss specific examples of how you've led SRE teams and improved platform reliability in your previous roles. Highlighting your leadership skills and strategic planning experience will resonate well with the hiring team.

✨Tip Number 4

Showcase your understanding of security best practices in cloud-native environments. Given the emphasis on security in the job description, being able to articulate your experience in this area will set you apart from other candidates.

We think you need these skills to ace Senior Site Reliability Engineering Manager, Production Engineering

Team Leadership
Site Reliability Engineering (SRE)
Incident Response Management
Change Management
Service Level Objectives (SLOs)
Kubernetes Expertise
Cloud Platform Knowledge (AWS preferred)
Microservices Architecture
Distributed Systems Understanding
Operational Excellence
Automation of Production Operations
Monitoring Solutions Implementation
Security Best Practices in Cloud-Native Environments
Cross-Functional Collaboration
Strong Communication Skills
Influencing Stakeholders
Performance Management
Continuous Learning and Innovation

Some tips for your application 🫡

Understand the Role: Take time to thoroughly read the job description for the Senior Site Reliability Engineering Manager position. Understand the key responsibilities and qualifications required, especially focusing on team leadership, operational excellence, and security compliance.

Tailor Your CV: Customise your CV to highlight relevant experience in site reliability engineering, team management, and cloud platforms like AWS. Emphasise your achievements in leading SRE teams and any specific projects that align with the role's requirements.

Craft a Compelling Cover Letter: Write a cover letter that showcases your passion for the role and the company. Mention your understanding of Cisco ThousandEyes' mission and how your skills can contribute to enhancing digital experiences. Be sure to address your leadership style and approach to fostering a high-performing team.

Showcase Relevant Skills: In your application, clearly outline your expertise in Kubernetes, microservices architecture, and incident response. Provide examples of how you've successfully implemented strategies to improve platform reliability and security in previous roles.

How to prepare for a job interview at ThousandEyes

✨Showcase Your Leadership Skills

As a Senior Site Reliability Engineering Manager, you'll need to demonstrate your ability to lead and mentor teams. Prepare examples of how you've built high-performing teams in the past and fostered a culture of continuous learning and innovation.

✨Understand the Technical Landscape

Make sure you're well-versed in site reliability principles, Kubernetes, and cloud platforms like AWS. Brush up on your knowledge of microservices architecture and distributed systems, as these are crucial for the role.

✨Prepare for Cross-Functional Collaboration

This role requires working closely with various teams, including software development and product management. Think of instances where you've successfully collaborated across departments and be ready to discuss how you can align SRE initiatives with business objectives.

✨Emphasise Security Awareness

Given the focus on security and compliance, be prepared to discuss your experience with security best practices in cloud-native environments. Highlight any relevant experience in DevSecOps or security engineering to show your understanding of enhancing security posture.

Senior Site Reliability Engineering Manager, Production Engineering
ThousandEyes
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

T
  • Senior Site Reliability Engineering Manager, Production Engineering

    London
    Full-Time
    54000 - 84000 £ / year (est.)
  • T

    ThousandEyes

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>