Staff Site Reliability Engineer
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Full-Time 64070 - 84569 £ / year (est.) Home office (partial)
O

At a Glance

  • Tasks: Ensure systems are reliable, scalable, and efficient while optimising performance and incident response.
  • Company: Join OVO, a mission-driven company tackling the climate crisis with innovative technology.
  • Benefits: Competitive salary, 34 days holiday, health benefits, and flexible perks.
  • Why this job: Make a real impact on sustainability while working with cutting-edge cloud technologies.
  • Qualifications: Experience in software engineering and cloud ecosystems; strong problem-solving skills.
  • Other info: Collaborative environment with opportunities for personal growth and community involvement.

The predicted salary is between 64070 - 84569 £ per year.

Location: Hub Based - Hybrid

Salary banding: £64,070 - £84,569

Experience: Mid-level/Expert

Working pattern: Full-Time

Reporting to: Principal Cloud Platform Engineer

Sponsorship: Unfortunately we are unable to offer sponsorship for this role.

This role in 3 words: Automation, Resilience, Observability

Top 3 qualities for this role: Analytical, Proactive, Collaborative

Where you’ll work: Depending on the needs of your business area, we expect hub based people to be in the office at least once a week, and to go to OVO Connection events in-person. You’ll be assigned to the closest one of our three hub offices, Bristol, Glasgow, or London; unless your role requires field-based work. Each hub has accessible spaces to park your laptop, is designed to inspire people, help them connect and bring big ideas to life.

Everyone belongs at OVO: At OVO, we are on a mission to solve one of humanity's biggest challenges, the climate crisis. And we know it takes all of us to change the world. That’s why we need diverse people from all abilities, gender identities, ethnicities, ages, sexual orientations, life experiences and backgrounds to join us.

Teamworking for the planet: Everything we do here spins around Plan Zero. So, naturally, the team you’ll be joining plays a gigantic role in making that happen. Here’s how: Site Reliability Engineering is at the heart of OVO's customer-focused technology transformation, building and maintaining scalable, efficient, and reliable platforms for OVO's applications and services. The goal of Site Reliability Engineering is to enhance the reliability, performance, and cost-efficiency of OVO's systems, enabling teams to confidently deliver robust services in GCP. This focus on smart and efficient usage of cloud services also contributes to reducing CO2 usage, which is at the heart of OVO's Plan Zero.

This role in a nutshell: As a Site Reliability Engineer at OVO, you’ll help ensure our systems are reliable, scalable, and efficient. You’ll focus on maintaining high service availability, improving performance, and optimising how we monitor and respond to incidents. Your expertise in reliability engineering will support continuous improvement, proactively resolve issues before they impact users, and strengthen the overall resilience of our infrastructure.

Your key outcomes will be:

  • Developing, Refining, and Automating Monitoring Systems: Design, manage and enhance monitoring, alerting and observability systems such as Datadog, Prometheus and Grafana, ensuring they deliver meaningful insights and effective alerting. You'll also automate repetitive monitoring tasks to improve efficiency.
  • Managing SLOs/SLIs and Improving Incident Response: Define and track SLOs and SLIs for key services, contributing to better reliability insights. You'll also help refine incident response processes, support on-call operations, and improve tooling and communication during incidents.
  • Incident Management and Post-Mortem Analysis: Play a key role in resolving complex production incidents, leading or supporting technical response efforts. Following incidents, you’ll conduct blameless post-mortems to uncover root causes and drive lasting improvements.
  • Cost Optimisation Implementation: Assess infrastructure usage and apply approved strategies to optimise cloud costs - balancing resource efficiency with performance and reliability.
  • Capacity Planning, Performance Tuning & Resilience: Using monitoring and load testing data, you’ll support capacity planning, recommend performance improvements and help implement resilience best practices across systems.
  • Collaboration and Knowledge Sharing: Work closely with engineering, QA, security and product teams to embed reliability practices, document key processes and mentor peers to support collective learning and growth.
  • Design Review Input: Take part in design reviews, offering guidance on how to improve reliability, scalability and day-to-day operability within system architecture.
  • Community of Practice: Actively contribute to your Community of Practice - leading discussions, sharing experiences, mentoring others and helping shape content and capability growth within your area of expertise.

You’ll be successful in this role if you…

  • Have a Software Engineering Background: You have professional experience in programming languages such as Python, Typescript, Go, or Java, and you apply software best practices (CI/CD, unit testing, code reviews) to infrastructure.
  • Experienced with the Cloud: You have hands-on experience navigating the complexities of public cloud ecosystems (AWS, GCP, or Azure) and understand the nuances of cloud-native networking and storage. You can demonstrate an understanding of how distributed systems may fail, and how to design for fault tolerance.
  • Infrastructure as Code (IaC) Expert: You have advanced experience with Terraform, Pulumi, or Crossplane to manage at-scale infrastructure.
  • Data-Driven Mindset: You use metrics and logs to drive engineering decisions. You understand the foundations of SLOs and error budgets.
  • Problem Solver: You enjoy complex debugging. You can dive into the Linux kernel or network stack to find the root cause of a performance bottleneck.
  • Mentor & Advocate: You are passionate about teaching 'The SRE Way' to engineers, helping them take ownership of their services' reliability.
  • Efficiency and Cost Engineering Mindset: You treat capacity planning, performance tuning, and cost optimisation as software engineering challenges rather than administrative tasks. You naturally lean toward building 'efficiency-as-code'.

What’s in it for you: We’ll pay you between £64,070 and £84,569, depending on your specific skills and experience. We keep our pay ranges broad on purpose to give us, and you, flexibility to match your experience to our zero carbon mission. You’ll be eligible for an on-target bonus of 15% and we have a single bonus plan focused on collective performance towards Plan Zero. We also offer green benefits and progressive policies, including 9% Flex Pay on top of your salary (4% auto-enrolled into pension, 5% for you to allocate to benefits, including pensions).

Here’s a taster of what’s on offer:

  • 34 days of holiday (including bank holidays)
  • Health benefits: healthcare cash plan or private medical insurance, critical illness cover, life assurance, health assessments, and more
  • Wellbeing: gym membership, travel insurance, workplace ISA, will writing services, dental insurance, and more
  • Lifestyle: extra holiday buying, discount dining, home & tech loans, and charitable giving via give-as-you-earn
  • Home: up to £400 towards any OVO Energy plan, discounts on solar, smart thermostats and EV chargers
  • Commute: ultra-low emission car leasing, cycle to work scheme and public transport loans

Want to hear about our full range of flexible benefits and progressive people policies? Our People Team can tell you everything you need to know.

Belonging: We have 8 Belonging Networks at OVO. When you join, you can participate in networks to support an inclusive workplace.

Oh, and one last thing: If you tick off most of our boxes but not every single thing, go ahead and apply. If you have additional requirements, there is a space on the application form to let us know.

Staff Site Reliability Engineer employer: OVO

At OVO, we pride ourselves on being an exceptional employer that champions diversity and innovation in the fight against climate change. Our hybrid work culture fosters collaboration and creativity, with access to inspiring hub offices in Bristol, Glasgow, or London, where employees can connect and share ideas. We offer competitive salaries, extensive benefits including 34 days of holiday, health and wellbeing support, and a strong commitment to employee growth through mentorship and community engagement, making OVO a truly rewarding place to advance your career.
O

Contact Detail:

OVO Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Staff Site Reliability Engineer

✨Tip Number 1

Network like a pro! Get out there and connect with people in the industry. Attend meetups, conferences, or even local tech events. You never know who might be looking for someone just like you!

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects and contributions. This is a great way to demonstrate your expertise in automation, resilience, and observability.

✨Tip Number 3

Prepare for interviews by practising common SRE scenarios. Think about how you would handle incidents or optimise cloud costs. Being able to discuss your thought process will impress potential employers.

✨Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, we love seeing candidates who are genuinely interested in joining our mission to tackle the climate crisis.

We think you need these skills to ace Staff Site Reliability Engineer

Site Reliability Engineering
Automation
Observability
Analytical Skills
Proactive Problem Solving
Collaboration
Monitoring Systems Management
Incident Management
Post-Mortem Analysis
Cost Optimisation
Capacity Planning
Performance Tuning
Infrastructure as Code (IaC)
Cloud Experience (AWS, GCP, Azure)
Programming (Python, Typescript, Go, Java)
Data-Driven Decision Making

Some tips for your application 🫡

Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the Site Reliability Engineer role. Highlight your experience with automation, resilience, and observability, as these are key aspects of the job.

Craft a Compelling Cover Letter: Use your cover letter to tell us why you're passionate about the climate crisis and how your background in software engineering can contribute to our mission at OVO. Be genuine and let your personality shine through!

Showcase Your Problem-Solving Skills: In your application, provide examples of how you've tackled complex issues in past roles. We love seeing candidates who can demonstrate their analytical and proactive approach to problem-solving.

Apply Through Our Website: We encourage you to apply directly through our website for the best chance of getting noticed. It’s the easiest way for us to keep track of your application and ensure it reaches the right people!

How to prepare for a job interview at OVO

✨Know Your Tech Stack

Make sure you’re well-versed in the programming languages and tools mentioned in the job description, like Python, Go, or Terraform. Brush up on your knowledge of cloud platforms such as GCP, AWS, or Azure, and be ready to discuss how you've used them in past projects.

✨Showcase Your Problem-Solving Skills

Prepare to share specific examples of complex debugging or incident management you've handled. Highlight your analytical mindset and how you approach problem-solving, especially in high-pressure situations. This will demonstrate your proactive nature and ability to maintain system reliability.

✨Emphasise Collaboration

Since this role requires working closely with various teams, think of examples where you’ve successfully collaborated with others. Discuss how you’ve shared knowledge or mentored peers, as this aligns with the collaborative spirit OVO values.

✨Understand Their Mission

Familiarise yourself with OVO's mission around climate change and Plan Zero. Be prepared to discuss how your skills can contribute to their goals, particularly in optimising cloud costs and enhancing system resilience. Showing that you care about their mission can set you apart from other candidates.

Staff Site Reliability Engineer
OVO

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

O
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>