Lead Site Reliability Engineer - Cloud
Lead Site Reliability Engineer - Cloud

Lead Site Reliability Engineer - Cloud

Bristol Full-Time 48000 - 72000 £ / year (est.) No home office possible
R

At a Glance

  • Tasks: Lead the design and operation of cloud infrastructure on Google Cloud Platform.
  • Company: Join a dynamic team focused on technology transformation and cloud services.
  • Benefits: Enjoy a competitive pension, annual bonuses, share schemes, and 30 days leave.
  • Why this job: Be part of a pivotal role in enhancing system reliability and automation.
  • Qualifications: Experience in SRE, cloud platforms, Kubernetes, and scripting languages required.
  • Other info: Hybrid working model with flexible policies to support diverse backgrounds.

The predicted salary is between 48000 - 72000 £ per year.

We are seeking an experienced and technically proficient Lead Site Reliability Engineer to join a growing team focused on delivering reliable, scalable, and secure cloud-based services. This is an excellent opportunity to play a pivotal role in one of the organisation's key technology transformation programmes.

As a Lead Site Reliability Engineer, you will contribute to the design, development, and operation of cloud infrastructure and applications on Google Cloud Platform. You will work collaboratively with engineering and infrastructure teams to implement site reliability engineering (SRE) principles, focusing on system reliability, observability, automation, and operational excellence. This role follows a hybrid working model, requiring attendance at the Bristol office for at least two days per week or 40% of the working time.

Key Responsibilities
  • Promote and embed SRE best practices within engineering teams and microservices environments
  • Partner with infrastructure and DevOps engineers to improve system resilience and performance
  • Troubleshoot complex incidents and implement long-term solutions through code and automation
  • Develop and improve automation pipelines to reduce manual operations and enhance system efficiency
  • Contribute to multiple strategic digital initiatives and collaborate across engineering domains
Essential Skills and Experience
  • Background in software engineering or telemetry, with current focus on SRE
  • Extensive experience with public cloud platforms, particularly Google Cloud (or AWS/Azure)
  • Proven ability to manage Kubernetes clusters in production environments
  • Competence in scripting and development using languages such as Python, Java, Go, Bash, or PowerShell
  • Strong understanding of service-level objectives (SLOs), indicators (SLIs), and monitoring practices
  • Hands-on experience with infrastructure as code (e.g., Terraform) and CI/CD tools (e.g., Jenkins, Azure DevOps)
Desirable Knowledge
  • Familiarity with observability and performance tools such as Dynatrace, Stackdriver, Cloud Monitoring, or similar
  • Exposure to cost monitoring, logging frameworks, and cloud consumption analytics
Personal Attributes
  • Ability to mentor and support engineers in adopting SRE methodologies
  • Logical and structured problem-solving approach
  • Excellent collaboration and communication skills within cross-functional teams
  • Strong awareness of the software development lifecycle and agile delivery practices

What We Offer

  • Competitive pension contribution of up to 15%
  • Annual performance-based bonus
  • Company share schemes, including free shares
  • 30 days of annual leave plus bank holidays
  • A broad selection of benefits tailored to lifestyle, wellbeing, and personal circumstances
  • Inclusive policies, including enhanced parental leave and workplace flexibility

We are committed to creating an inclusive working environment that supports diversity in all forms. We welcome applications from all backgrounds and offer reasonable adjustments throughout the recruitment process.

Lead Site Reliability Engineer - Cloud employer: Robert Walters

Join a forward-thinking organisation as a Lead Site Reliability Engineer in Bristol, where you will be at the forefront of cloud technology transformation. We pride ourselves on fostering a collaborative and inclusive work culture that prioritises employee growth through mentorship and continuous learning opportunities. With competitive benefits including a generous pension scheme, performance bonuses, and a commitment to work-life balance, we offer a rewarding environment for those looking to make a meaningful impact in the tech industry.
R

Contact Detail:

Robert Walters Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Lead Site Reliability Engineer - Cloud

✨Tip Number 1

Familiarise yourself with Google Cloud Platform and its services. Since this role specifically requires expertise in GCP, having hands-on experience or relevant certifications can set you apart from other candidates.

✨Tip Number 2

Showcase your experience with Kubernetes and automation tools. Be prepared to discuss specific projects where you've managed Kubernetes clusters or developed automation pipelines, as these are crucial for the role.

✨Tip Number 3

Highlight your understanding of SRE principles and practices. Be ready to explain how you've implemented service-level objectives (SLOs) and monitoring practices in previous roles, as this will demonstrate your alignment with the team's goals.

✨Tip Number 4

Prepare to discuss your collaboration skills. This role involves working closely with cross-functional teams, so think of examples where you've successfully partnered with engineers or DevOps teams to improve system resilience and performance.

We think you need these skills to ace Lead Site Reliability Engineer - Cloud

Site Reliability Engineering (SRE)
Cloud Infrastructure Management
Google Cloud Platform (GCP)
Kubernetes Management
Scripting Languages (Python, Java, Go, Bash, PowerShell)
Service-Level Objectives (SLOs) and Indicators (SLIs)
Monitoring Practices
Infrastructure as Code (Terraform)
CI/CD Tools (Jenkins, Azure DevOps)
Automation Pipeline Development
Incident Troubleshooting
Collaboration and Communication Skills
Agile Delivery Practices
Mentoring and Support for Engineers
Logical Problem-Solving

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience with cloud platforms, particularly Google Cloud, and your proficiency in SRE principles. Use specific examples that demonstrate your skills in managing Kubernetes clusters and developing automation pipelines.

Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for the role and the company. Mention how your background in software engineering aligns with their needs and how you can contribute to their technology transformation programmes.

Showcase Relevant Skills: Clearly outline your technical skills related to scripting languages like Python or Java, and your experience with infrastructure as code tools such as Terraform. Highlight any familiarity with observability tools and CI/CD practices.

Prepare for Technical Questions: Anticipate technical questions related to SRE best practices, system reliability, and troubleshooting complex incidents. Be ready to discuss your problem-solving approach and how you've mentored others in adopting SRE methodologies.

How to prepare for a job interview at Robert Walters

✨Showcase Your SRE Knowledge

Be prepared to discuss your understanding of site reliability engineering principles. Highlight your experience with SLOs, SLIs, and how you've implemented these in past roles. This will demonstrate your technical proficiency and alignment with the company's focus.

✨Demonstrate Cloud Expertise

Since the role requires extensive experience with Google Cloud Platform, be ready to share specific examples of projects you've worked on. Discuss your familiarity with Kubernetes and any challenges you've overcome while managing clusters in production environments.

✨Emphasise Automation Skills

Talk about your experience with automation pipelines and infrastructure as code. Mention tools like Terraform and CI/CD practices you've implemented, as this aligns with the company's goal of enhancing system efficiency through automation.

✨Prepare for Problem-Solving Scenarios

Expect to face hypothetical scenarios or case studies during the interview. Practice articulating your logical and structured approach to troubleshooting complex incidents, as well as how you would implement long-term solutions through code and automation.

Lead Site Reliability Engineer - Cloud
Robert Walters
R
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>