Job Board

Companies

Tbwa Chiat/Day Inc

Site Reliability Engineer, Observability London, United Kingdom

London Full-Time 43200 - 72000 £ / year (est.) No home office possible

At a Glance

Tasks: Join our team to enhance observability for a cutting-edge digital experience platform.
Company: Cisco ThousandEyes empowers organizations to deliver flawless digital experiences across every network.
Benefits: Enjoy a collaborative culture, opportunities for growth, and the chance to work with innovative technologies.
Why this job: Be part of a dynamic team that values diversity and innovation while making a real impact.
Qualifications: Strong coding skills in Python or Go; familiarity with observability concepts and AWS services.
Other info: We encourage applicants from diverse backgrounds and those who may not meet every qualification.

The predicted salary is between 43200 - 72000 £ per year.

Site Reliability Engineer, Observability

London, United Kingdom

Who We Are

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end-user experiences.

ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

About The Role

The Site Reliability Engineering team focused on Observability is responsible for providing the tools, services, and infrastructure to monitor and observe the ThousandEyes platform. Leveraging cloud native tools like Prometheus, Grafana, Kibana, and even ThousandEyes itself, we enable our developers to instrument, analyze, and monitor their applications. The Site Reliability Engineer in this role will work together with the team to own our observability stack while working with developers to continuously improve our view of the platform.

What You’ll Do

As we expand our platform to a multi-region scale, it is essential to design and implement strategies that enhance visibility. This involves designing, deploying, and maintaining cloud-native monitoring services that are both elastic and resilient to failure. It is also crucial to establish standards and best practices for the instrumentation of container-based services and cloud-managed services. The maintenance of our alerting pipeline is key to ensuring that notifications are timely, accurate, and directed to the appropriate channels. Automation is a priority, as it allows our monitoring platforms to scale effortlessly, promoting a self-service approach. Additionally, active participation and contribution to the improvement of our 24×7 incident response and on-call rotation are vital to the robustness of our operational response.

Qualifications

Ability to write high quality code, preferably in Python or Go.
Passion for SRE / DevOps roles and Operational Excellence.
Familiarity with the most common Observability concepts: metrics, logs and traces.
Understanding of monitoring and alerting systems. Experience with the Grafana Observability Stack is a plus.
Good understanding of AWS services.
Infrastructure as Code skills, ideally with Terraform and Kubernetes.

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That’s why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you’re interested in this work.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

#J-18808-Ljbffr

Site Reliability Engineer, Observability London, United Kingdom employer: Tbwa Chiat/Day Inc

At Cisco ThousandEyes, we pride ourselves on being an exceptional employer that fosters a culture of innovation and collaboration in the heart of London. Our commitment to employee growth is evident through continuous learning opportunities and a supportive environment that values diverse perspectives. With competitive benefits and a focus on work-life balance, we empower our Site Reliability Engineers to thrive while contributing to meaningful projects that enhance digital experiences globally.

Contact Detail:

Tbwa Chiat/Day Inc Recruiting Team

View Tbwa Chiat/Day Inc Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer, Observability London, United Kingdom

✨Tip Number 1

Familiarize yourself with the specific tools mentioned in the job description, such as Prometheus, Grafana, and Kibana. Having hands-on experience or projects showcasing your skills with these tools can set you apart during discussions.

✨Tip Number 2

Understand the principles of cloud-native monitoring and how to implement resilient systems. Being able to discuss strategies for enhancing visibility in a multi-region scale will demonstrate your alignment with the role's requirements.

✨Tip Number 3

Showcase your coding skills, particularly in Python or Go, through personal projects or contributions to open-source. This will not only highlight your technical abilities but also your passion for SRE and DevOps roles.

✨Tip Number 4

Engage with the SRE community by participating in forums or attending meetups. Networking with professionals in the field can provide insights and potentially lead to referrals, increasing your chances of landing the job.

We think you need these skills to ace Site Reliability Engineer, Observability London, United Kingdom

High-Quality Code Writing (Python or Go)

Site Reliability Engineering (SRE) Knowledge

DevOps Practices

Operational Excellence

Observability Concepts (Metrics, Logs, Traces)

Monitoring and Alerting Systems Understanding

Grafana Observability Stack Experience

AWS Services Proficiency

Infrastructure as Code (Terraform, Kubernetes)

Cloud-Native Monitoring Services Design

Automation Skills

Incident Response and On-Call Rotation Participation

Collaboration with Development Teams

Elastic and Resilient System Design

Some tips for your application 🫡

Understand the Role: Make sure to thoroughly read the job description for the Site Reliability Engineer position. Understand the key responsibilities and qualifications required, especially around observability tools and cloud-native services.

Highlight Relevant Experience: In your CV and cover letter, emphasize any experience you have with Python or Go, as well as your familiarity with observability concepts like metrics, logs, and traces. Mention any specific projects where you've used Grafana or AWS services.

Showcase Your Passion: Express your enthusiasm for SRE and DevOps roles in your application. Share examples of how you've contributed to operational excellence or improved monitoring systems in previous positions.

Tailor Your Application: Customize your CV and cover letter to reflect the values and mission of Cisco ThousandEyes. Highlight your commitment to diversity and inclusion, and how your unique background can contribute to the team.

How to prepare for a job interview at Tbwa Chiat/Day Inc

✨Understand the Observability Stack

Make sure you have a solid grasp of the tools mentioned in the job description, like Prometheus, Grafana, and Kibana. Be prepared to discuss how you've used these tools in past projects or how you would implement them in a new environment.

✨Showcase Your Coding Skills

Since the role requires writing high-quality code, brush up on your Python or Go skills. Be ready to solve coding challenges during the interview and explain your thought process clearly.

✨Discuss Automation and Infrastructure as Code

Highlight your experience with automation and Infrastructure as Code, particularly with Terraform and Kubernetes. Share specific examples of how you've implemented these practices to improve efficiency in previous roles.

✨Emphasize Your Passion for SRE/DevOps

Express your enthusiasm for Site Reliability Engineering and DevOps. Discuss any relevant projects or experiences that demonstrate your commitment to operational excellence and continuous improvement.

Site Reliability Engineer, Observability London, United Kingdom

London

Full-Time

43200 - 72000 £ / year (est.)

Application deadline: 2027-02-28
Tbwa Chiat/Day Inc

View Tbwa Chiat/Day Inc Profile

Similar positions in other companies

Europas größte Jobbörse für Gen-Z

Discover now