Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes
Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes

Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes

Full-Time 48000 - 72000 £ / year (est.) Home office (partial)
Go Premium
C

At a Glance

  • Tasks: Design and manage large-scale distributed systems, enhancing reliability and performance.
  • Company: Join Cisco ThousandEyes, a leader in Digital Experience Assurance.
  • Benefits: Competitive salary, inclusive culture, and endless growth opportunities.
  • Why this job: Make a real impact on digital experiences with cutting-edge technology.
  • Qualifications: Expertise in Kubernetes, Python or Go, and cloud platforms like AWS.
  • Other info: Collaborative environment focused on innovation and continuous improvement.

The predicted salary is between 48000 - 72000 £ per year.

Meet the Team

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network - even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues - before they impact end-user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

Your Impact

We are seeking a skilled Senior Site Reliability Engineer (SRE) in Production Engineering with a strong background in SaaS and operations. You will design and manage large-scale, highly available distributed systems in the cloud, collaborating directly with application development teams to enhance the reliability, performance, and security of our platform.

  • Technical Leadership & Collaboration: Forge strong partnerships with cross-functional stakeholders to identify requirements and deliver solutions that address project and departmental objectives.
  • Solution Design & Deployment: Architect and implement small to mid-size or moderately complex solutions that elevate reliability, availability, latency, and performance across diverse environments and customer segments.
  • Automation & Service Reliability: Combine expertise in design, automation, deployment, and coding to enhance system reliability for new and existing platforms, tailoring approaches to regional, national, or customer-specific needs.
  • High Availability & Disaster Recovery: Develop and validate automated high-availability and disaster recovery mechanisms, ensuring systems are robust, scalable, and support rapid velocity in delivery. Take part in regular disaster recovery drills.
  • Capacity Planning & Reporting: Analyze resource usage and produce actionable reports to forecast and address capacity constraints, supporting proactive decision-making and operational excellence.
  • Monitoring & Tooling: Design, build, and deploy tools that deliver comprehensive visibility into infrastructure performance and reliability. Automate key platform functions for efficiency and resilience.
  • Incident Response & Continuous Improvement: Monitor production environments, collaborate with Development and Operations to diagnose issues, and develop monitoring tools to preemptively identify and resolve problems. Serve as on-call Site Reliability Engineer (SRE), lead post-mortems, and deliver clear root cause analyses.
  • Security & Compliance: Embed strong security controls in architectural design, collaborate with security teams to enhance safeguards, and contribute to incident response efforts as needed. Work closely with various teams specializing in security, to ensure various platform components and infrastructure is secure at the highest possible level.

Minimum Qualifications

  • Expert-level knowledge of Kubernetes and its ecosystem.
  • Proficiency in software development with languages such as Python or Go.
  • In-depth knowledge of cloud providers, preferably AWS.
  • Solid conceptual and practical knowledge in Web technologies, Networking, and Linux.
  • Knowledge of Site Reliability principles: Incident Response, Change Management, Distributed Systems, Deployment Strategies, and SLOs.

Preferred Qualifications

  • Familiarity with best practices for operating a large-scale, highly available enterprise platform.
  • 5+ years of experience in a related role.
  • Proven ability to build and implement scalable and well-tested solutions.
  • Excellent communication and documentation skills.
  • Strong sense of ownership, drive, and attention to detail.

Why Cisco?

At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes employer: Cisco Systems

Cisco ThousandEyes is an exceptional employer that fosters a collaborative and innovative work culture, empowering employees to make a significant impact in the realm of digital experience assurance. With a strong emphasis on professional growth, Cisco offers extensive opportunities for skill development and career advancement, all while working in a dynamic environment that values diversity and inclusion. Located in a vibrant tech hub, employees benefit from a supportive community and access to cutting-edge technology, making it an ideal place for those seeking meaningful and rewarding employment.
C

Contact Detail:

Cisco Systems Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes

✨Tip Number 1

Network like a pro! Reach out to current employees at Cisco ThousandEyes on LinkedIn or other platforms. Ask them about their experiences and any tips they might have for the interview process. Building connections can give us valuable insights and even a foot in the door!

✨Tip Number 2

Prepare for technical interviews by brushing up on your Kubernetes and cloud knowledge. We recommend setting up mock interviews with friends or using online platforms to practice coding challenges. The more comfortable we are with the tech, the better we’ll perform!

✨Tip Number 3

Showcase your problem-solving skills! During interviews, be ready to discuss past projects where you tackled complex issues. Use the STAR method (Situation, Task, Action, Result) to structure your answers and highlight your impact.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows that you’re genuinely interested in joining the Cisco ThousandEyes team. Let’s get that application in!

We think you need these skills to ace Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes

Kubernetes
Python
Go
AWS
Web Technologies
Networking
Linux
Site Reliability Principles
Incident Response
Change Management
Distributed Systems
Deployment Strategies
SLOs
Automation
Monitoring Tools

Some tips for your application 🫡

Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience with Kubernetes, cloud providers, and SaaS operations. We want to see how your skills align with the role of a Senior Site Reliability Engineer.

Showcase Your Technical Skills: Don’t hold back on showcasing your expertise in Python or Go, as well as your knowledge of Site Reliability principles. We love seeing concrete examples of how you've applied these skills in past roles.

Be Clear and Concise: When writing your application, keep it clear and to the point. Use bullet points where possible to make it easy for us to see your achievements and qualifications at a glance.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

How to prepare for a job interview at Cisco Systems

✨Know Your Tech Inside Out

Make sure you brush up on your Kubernetes knowledge and be ready to discuss your experience with cloud providers like AWS. Be prepared to share specific examples of how you've designed and managed distributed systems in the past.

✨Showcase Your Problem-Solving Skills

Prepare to talk about incidents you've handled in production environments. Highlight your approach to diagnosing issues and the tools you used to monitor and resolve them. This will demonstrate your ability to think on your feet and contribute to incident response.

✨Emphasise Collaboration

Cisco values teamwork, so be ready to discuss how you've collaborated with cross-functional teams in previous roles. Share examples of how you’ve forged partnerships to deliver solutions that meet project objectives, showcasing your communication skills.

✨Demonstrate a Security Mindset

Since security is a key aspect of the role, prepare to discuss how you've embedded security controls in your architectural designs. Talk about any experiences working with security teams and how you’ve contributed to incident response efforts.

Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes
Cisco Systems
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

C
  • Senior Site Reliability Engineer, Production Engineering - Cisco ThousandEyes

    Full-Time
    48000 - 72000 £ / year (est.)
  • C

    Cisco Systems

    10,000+
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>