Principal Site Reliability Engineer
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Full-Time 72000 - 108000 £ / year (est.) Home office (partial)
O

At a Glance

  • Tasks: Lead the charge in scaling and securing our AWS and Kubernetes infrastructure.
  • Company: Join Orgvue, a top-tier organisational design software platform with a global presence.
  • Benefits: Enjoy hybrid working, wellness perks, private medical insurance, and generous holiday allowance.
  • Why this job: Make a real impact by enhancing reliability and operational excellence in a dynamic tech environment.
  • Qualifications: Proven SRE experience with strong Kubernetes and AWS skills.
  • Other info: Be part of a diverse team that values individualism and career growth.

The predicted salary is between 72000 - 108000 £ per year.

Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world. Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.

Responsibilities
  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Crafting and implementing a cloud infrastructure and tooling strategy
  • Work across our Org to level up SRE practices
  • Help implement robust observability metrics, logs & traces using our observability tool
  • Guide the team in building automated, self-healing systems
  • Own and evolve our incident response processes, including on-call practices and post-mortem culture
  • Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
  • Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
  • Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
Qualifications
  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
  • Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews
Benefits
  • Hybrid working - 1+ days a week in the London office
  • Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
  • Subsidised Gym Membership
  • Private Medical Insurance (including Dental and Vision) and Life Assurance
  • 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
  • Summer Fridays (half-day Fridays for the months of July and August)
  • Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
  • Season ticket Loan
  • Cycle to Work Scheme
  • Annual Discretionary Bonus

Here at Orgvue we promote individualism and a diverse workforce to build on our future success.

Principal Site Reliability Engineer employer: Orgvue Limited

Orgvue is an exceptional employer that champions innovation and individual growth within a dynamic work environment. With a strong focus on employee wellbeing, including hybrid working options and comprehensive health benefits, Orgvue fosters a culture of collaboration and mentorship, empowering its team members to excel in their roles. Located in London, employees benefit from a vibrant city atmosphere while enjoying unique perks such as Summer Fridays and an annual discretionary bonus, making it a truly rewarding place to advance your career in site reliability engineering.
O

Contact Detail:

Orgvue Limited Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Principal Site Reliability Engineer

✨Network Like a Pro

Get out there and connect with folks in the industry! Attend meetups, webinars, or even just grab a coffee with someone who works at Orgvue. Building relationships can open doors that a CV just can't.

✨Show Off Your Skills

When you get the chance to chat with potential employers, make sure to highlight your hands-on experience with AWS and Kubernetes. Share specific examples of how you've tackled challenges in your previous roles – it’ll show them you’re the real deal!

✨Be Ready for Technical Challenges

Prepare for technical interviews by brushing up on your SRE knowledge and practices. Think about how you would define SLOs or handle incident management. Being able to discuss these topics confidently will set you apart from the crowd.

✨Apply Through Our Website

Don’t forget to apply directly through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the Orgvue team.

We think you need these skills to ace Principal Site Reliability Engineer

Site Reliability Engineering
AWS Core Services
Kubernetes (EKS preferred)
Infrastructure as Code (IaC)
Terraform
GitOps
Observability (metrics, visualization, logging, tracing)
Automation
SDLC
CI/CD Pipelines
Incident Management
Disaster Recovery Planning
Root Cause Analysis
Post-Incident Reviews
Mentoring

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the Principal Site Reliability Engineer role. Highlight your experience with AWS, Kubernetes, and Infrastructure as Code. We want to see how your skills align with what we're looking for!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about SRE and how you can contribute to our team at Orgvue. Keep it concise but impactful – we love a good story!

Showcase Your Achievements: When detailing your experience, focus on specific achievements that demonstrate your expertise in SRE transformations and incident management. Numbers and results speak volumes, so don’t hold back!

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy – just a few clicks and you’re done!

How to prepare for a job interview at Orgvue Limited

✨Know Your Tech Inside Out

Make sure you’re well-versed in AWS and Kubernetes, especially if you have experience with EKS. Brush up on your knowledge of core services like EC2, RDS, and CloudWatch, as these will likely come up during the interview.

✨Showcase Your SRE Experience

Be ready to discuss your past experiences leading SRE transformations. Prepare specific examples where you defined SLOs, implemented observability metrics, or improved incident response processes. This will demonstrate your hands-on expertise.

✨Emphasise Collaboration Skills

Since the role involves working closely with security, DevOps, and software teams, highlight your ability to collaborate effectively. Share examples of how you’ve worked cross-functionally to achieve operational excellence.

✨Prepare for Scenario Questions

Expect scenario-based questions that test your problem-solving skills. Think about how you would handle incidents, disaster recovery planning, or implementing Infrastructure as Code. Practising these scenarios can help you articulate your thought process clearly.

Principal Site Reliability Engineer
Orgvue Limited

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

O
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>