Principal Site Reliability Engineer
Principal Site Reliability Engineer

Principal Site Reliability Engineer

London Full-Time 54000 - 84000 £ / year (est.) Home office (partial)
O

At a Glance

  • Tasks: Lead the charge in scaling and securing our AWS and Kubernetes infrastructure.
  • Company: Join Orgvue, a cutting-edge platform transforming workforce planning for top enterprises worldwide.
  • Benefits: Enjoy hybrid working, wellbeing initiatives, subsidised gym membership, and generous holiday allowance.
  • Why this job: Be part of a dynamic team shaping a world-class reliability culture in a fast-paced environment.
  • Qualifications: Bring your expertise in SRE transformations, Kubernetes, AWS, and Infrastructure as Code.
  • Other info: Embrace a diverse workplace that values individualism and promotes growth.

The predicted salary is between 54000 - 84000 £ per year.

Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills they have. Our platform connects strategy to structure, providing clarity of vision, so you can build a more adaptable, better performing organisation that thrives in a constantly changing world of work. The world’s largest and best-known enterprises and consulting firms use Orgvue to visualise and model current and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

As a Principal Site Reliability Engineer, you will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient — even at scale. This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We’re looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.

As a Lead Software Engineer, you will:

  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Craft and implement a cloud infrastructure and tooling strategy
  • Work across our organization to level up SRE practices
  • Help implement robust observability metrics, logs & traces using our observability tools
  • Guide the team in building automated, self-healing systems
  • Own and evolve our incident response processes, including on-call practices and post-mortem culture
  • Mentor engineers across the organization on best practices in reliability, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation, and GitOps practices
  • Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
  • Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform

Desired Skills & Experience:

  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expertise in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
  • Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews

Hybrid working - 1+ days a week in the London office

Wellbeing initiatives including Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, and an Annual Wellbeing day

Subsidised Gym Membership

Private Medical Insurance (including Dental and Vision) and Life Assurance

25 days holiday (increasing to 30 days at a rate of 1 extra day per year)

Summer Fridays (half-day Fridays for July and August)

Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%

Season ticket Loan

Cycle to Work Scheme

Annual Discretionary Bonus

Here at Orgvue, we promote individualism and a diverse workforce to build on our future success.

Principal Site Reliability Engineer employer: Orgvue

Orgvue is an exceptional employer that fosters a culture of collaboration and innovation, making it an ideal place for a Principal Site Reliability Engineer to thrive. With a strong focus on employee wellbeing, including initiatives like Sanctus Coaching and subsidised gym memberships, alongside generous holiday allowances and a commitment to professional growth, Orgvue empowers its team members to excel in their roles while enjoying a balanced work-life experience. Located in the vibrant city of London, employees benefit from a dynamic environment that encourages creativity and adaptability, ensuring that every individual can contribute meaningfully to the company's mission.
O

Contact Detail:

Orgvue Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Principal Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with Orgvue's platform and its features. Understanding how their organisational design and planning tools work will help you articulate how your skills can enhance their infrastructure and reliability.

✨Tip Number 2

Network with current or former employees of Orgvue on platforms like LinkedIn. Engaging in conversations about their experiences can provide valuable insights into the company culture and expectations for the Principal Site Reliability Engineer role.

✨Tip Number 3

Prepare to discuss specific examples of your experience with AWS and Kubernetes during interviews. Highlighting your hands-on expertise and successful projects will demonstrate your capability to lead SRE transformations effectively.

✨Tip Number 4

Showcase your communication and collaboration skills by preparing scenarios where you've worked across teams. This role requires a strong ability to mentor and guide others, so be ready to share how you've successfully done this in the past.

We think you need these skills to ace Principal Site Reliability Engineer

AWS Core Services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch)
Kubernetes (EKS preferred)
Infrastructure as Code (IaC) using Terraform
GitOps workflows
Observability (metrics, visualization, logging, tracing)
Automation and CI/CD pipelines
Incident management and disaster recovery planning
Root cause analysis and post-incident reviews
Strong communication skills
Collaboration across multiple teams
Strategic vision for reliability culture
Experience in leading SRE transformations
Ability to mentor engineers on best practices
Understanding of deployment automation and release strategies

Some tips for your application 🫡

Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the Principal Site Reliability Engineer position. Familiarise yourself with Orgvue's platform and how your skills align with their needs.

Tailor Your CV: Customise your CV to highlight relevant experience in AWS, Kubernetes, and SRE transformations. Use specific examples that demonstrate your technical expertise and leadership capabilities in similar roles.

Craft a Compelling Cover Letter: Write a cover letter that showcases your passion for reliability engineering and your strategic vision. Mention how your background aligns with Orgvue's mission and how you can contribute to building a world-class reliability culture.

Highlight Collaboration Skills: Since the role involves working across multiple teams, emphasise your communication and collaboration skills. Provide examples of successful projects where you worked with diverse teams to achieve common goals.

How to prepare for a job interview at Orgvue

✨Showcase Your Technical Expertise

As a Principal Site Reliability Engineer, it's crucial to demonstrate your deep hands-on expertise with Kubernetes and AWS services. Be prepared to discuss specific projects where you've successfully implemented these technologies, focusing on the challenges you faced and how you overcame them.

✨Communicate Your Strategic Vision

This role requires a blend of technical skills and strategic thinking. During the interview, articulate your vision for building a reliability culture and how you plan to implement SLOs, SLIs, and error budgets. Use examples from your past experiences to illustrate your approach.

✨Emphasise Collaboration Skills

Collaboration across teams is key in this position. Highlight your experience working with cross-functional teams, such as DevOps and security, and provide examples of how you've successfully driven initiatives that required teamwork and communication.

✨Prepare for Scenario-Based Questions

Expect scenario-based questions that assess your problem-solving abilities in real-world situations. Prepare to discuss how you would handle incidents, implement disaster recovery plans, or improve observability metrics. This will showcase your practical knowledge and readiness for the role.

Principal Site Reliability Engineer
Orgvue
O
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>