Site Reliability Engineer (Prometheus and Grafana)
Site Reliability Engineer (Prometheus and Grafana)

Site Reliability Engineer (Prometheus and Grafana)

London Full-Time 36000 - 60000 £ / year (est.) No home office possible
R

At a Glance

  • Tasks: Design and maintain monitoring systems using Prometheus and Grafana.
  • Company: Join a global cybersecurity team focused on Data Loss Prevention.
  • Benefits: Gain experience in a cloud-first environment with opportunities for growth.
  • Why this job: Perfect for those wanting to dive into cybersecurity while enhancing their engineering skills.
  • Qualifications: Strong experience with Prometheus, Splunk, and observability principles required.
  • Other info: Participate in a 24/7 on-call support rota and collaborate in an Agile setting.

The predicted salary is between 36000 - 60000 £ per year.

Join a global team of engineers, operators, and Agile practitioners responsible for building and operating a world-class Data Loss Prevention (DLP) infrastructure. This role is within the Cybersecurity organization, focusing on enhancing observability and telemetry across the DLP stack to support a cloud-first strategy while maintaining strong on-premise capabilities. This is an exciting opportunity for engineers with strong SRE and monitoring experience, and also a great entry point for professionals looking to transition into cybersecurity.

Key Responsibilities

  • Design and maintain Prometheus metrics collection and PromQL queries
  • Build, review, and optimize Grafana and Splunk dashboards using observability best practices (e.g., Four Golden Signals, RED methodology)
  • Refine alerting rules across tools like PagerDuty, Prometheus, and Splunk to eliminate noise and identify gaps
  • Work closely with engineering squads to implement and maintain SLO/SLIs and error budgets
  • Operate Prometheus in agent mode and troubleshoot issues
  • Use telemetry data to generate actionable insights for the DLP teams
  • Drive continuous improvement of monitoring and observability systems
  • Participate in a 24/7 on-call support rota for DLP products
  • Collaborate in a DevOps and Agile environment

Required Skills and Experience

  • Strong hands-on experience with Prometheus and PromQL
  • Solid experience with Splunk dashboarding and queries
  • Deep understanding of observability and monitoring principles
  • Familiarity with SRE practices, SLO/SLIs, and error budget management
  • Experience with PagerDuty or similar alerting/orchestration platforms
  • Fluent in at least one programming or scripting language
  • Knowledge of CI/CD tools (e.g., Jenkins, Bitbucket)
  • Experience working in cloud environments (AWS or similar) or Unix/Linux systems
  • Excellent collaboration, communication, and problem-solving skills

Nice to Have Experience with:

  • Cybersecurity or DLP products
  • Incident, problem, and change management tools
  • OpenTelemetry or telemetry pipeline tooling
  • Automation and scripting for monitoring
  • Working in Agile or operational environments

Why Join?

  • Work on a globally distributed, high-impact security team
  • Learn and grow in a DevOps-driven, cloud-first organization
  • Transition into cybersecurity or expand your existing expertise

Application Process

Please include your: First and last name, Email address, Phone number (including country code), CV / Resume. Additionally, indicate your eligibility to work in the country you are applying to: Yes, I am currently eligible to work (work permit/visa/citizenship) or No, I am not currently eligible to work (work permit/visa/citizenship).

Site Reliability Engineer (Prometheus and Grafana) employer: Robert Walters

As a Site Reliability Engineer in London, you will be part of a dynamic and innovative team dedicated to enhancing cybersecurity through cutting-edge technology. Our company fosters a collaborative work culture that prioritises employee growth, offering extensive learning opportunities in a DevOps-driven environment. With a focus on work-life balance and a commitment to continuous improvement, we provide a unique chance to make a significant impact while advancing your career in the exciting field of cybersecurity.
R

Contact Detail:

Robert Walters Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer (Prometheus and Grafana)

✨Tip Number 1

Familiarise yourself with Prometheus and Grafana by exploring their documentation and community forums. Engaging with these resources can help you understand best practices and common pitfalls, which will be beneficial during interviews.

✨Tip Number 2

Join online communities or local meetups focused on Site Reliability Engineering and observability tools. Networking with professionals in the field can provide insights into the role and may even lead to referrals.

✨Tip Number 3

Consider contributing to open-source projects related to monitoring and observability. This not only enhances your skills but also showcases your commitment and expertise to potential employers.

✨Tip Number 4

Prepare for technical interviews by practising problem-solving scenarios that involve SLOs, SLIs, and error budgets. Being able to discuss these concepts confidently will demonstrate your understanding of SRE principles.

We think you need these skills to ace Site Reliability Engineer (Prometheus and Grafana)

Hands-on experience with Prometheus
Proficiency in PromQL
Experience with Splunk dashboarding and queries
Understanding of observability and monitoring principles
Familiarity with SRE practices
Knowledge of SLO/SLIs and error budget management
Experience with PagerDuty or similar alerting platforms
Fluency in at least one programming or scripting language
Knowledge of CI/CD tools (e.g., Jenkins, Bitbucket)
Experience working in cloud environments (AWS or similar)
Proficiency in Unix/Linux systems
Excellent collaboration skills
Strong communication skills
Problem-solving skills
Ability to participate in a 24/7 on-call support rota

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience with Prometheus, Grafana, and any relevant SRE practices. Use specific examples that demonstrate your skills in observability and monitoring.

Craft a Strong Cover Letter: In your cover letter, express your enthusiasm for the role and the company. Mention how your background aligns with the responsibilities listed, particularly your experience with metrics collection and alerting tools.

Showcase Relevant Skills: Clearly outline your hands-on experience with Prometheus and Splunk in your application. Include any familiarity with CI/CD tools and cloud environments, as these are crucial for the role.

Highlight Collaboration Experience: Since the role involves working closely with engineering squads, emphasise any past experiences where you collaborated in a DevOps or Agile environment. This will show your ability to work effectively within teams.

How to prepare for a job interview at Robert Walters

✨Showcase Your Technical Skills

Be prepared to discuss your hands-on experience with Prometheus and PromQL in detail. Highlight specific projects where you've designed metrics collection or optimised dashboards, as this will demonstrate your technical expertise relevant to the role.

✨Understand Observability Principles

Familiarise yourself with observability best practices, such as the Four Golden Signals and the RED methodology. Be ready to explain how you have applied these principles in past roles to enhance monitoring and alerting systems.

✨Prepare for Scenario-Based Questions

Expect questions that assess your problem-solving skills in real-world scenarios. Think of examples where you've refined alerting rules or collaborated with engineering teams to implement SLOs/SLIs, as these experiences are crucial for the role.

✨Demonstrate Collaboration and Communication Skills

Since the role involves working closely with various teams, be prepared to discuss how you've effectively communicated and collaborated in a DevOps or Agile environment. Share specific instances where your communication skills led to successful project outcomes.

Site Reliability Engineer (Prometheus and Grafana)
Robert Walters
R
  • Site Reliability Engineer (Prometheus and Grafana)

    London
    Full-Time
    36000 - 60000 £ / year (est.)

    Application deadline: 2027-05-26

  • R

    Robert Walters

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>