At a Glance
- Tasks: Lead the charge in scaling our AWS and Kubernetes infrastructure while enhancing reliability.
- Company: Join Orgvue, a top-tier organisational design platform transforming how businesses operate.
- Benefits: Enjoy hybrid working, wellness perks, private medical insurance, and generous holiday allowance.
- Why this job: Make a real impact by shaping the future of our SaaS platform with cutting-edge technology.
- Qualifications: Proven SRE experience, strong AWS and Kubernetes skills, and a passion for automation.
- Other info: Be part of a diverse team that values individualism and offers excellent career growth.
The predicted salary is between 72000 - 108000 £ per year.
Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world. Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.
We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.
Responsibilities:- Define and enforce SLOs, SLIs, and error budgets across critical services
- Crafting and implementing a cloud infrastructure and tooling strategy
- Work across our Org to level up SRE practices
- Help implement robust observability metrics, logs & traces using our observability tool
- Guide the team in building automated, self-healing systems
- Own and evolve our incident response processes, including on-call practices and post-mortem culture
- Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
- Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
- Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
- Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
- Demonstrable experience leading SRE transformations
- Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
- Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
- Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
- Strong background in observability: metrics, visualization, logging, and tracing
- Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
- Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews
- Hybrid working - 1+ days a week in the London office
- Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
- Subsidised Gym Membership
- Private Medical Insurance (including Dental and Vision) and Life Assurance
- 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
- Summer Fridays (half-day Fridays for the months of July and August)
- Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
- Season ticket Loan
- Cycle to Work Scheme
- Annual Discretionary Bonus
Here at Orgvue we promote individualism and a diverse workforce to build on our future success.
Principal Site Reliability Engineer in London employer: Orgvue Limited
Contact Detail:
Orgvue Limited Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Principal Site Reliability Engineer in London
✨Tip Number 1
Network like a pro! Reach out to folks in your industry on LinkedIn or at meetups. A friendly chat can lead to opportunities that aren’t even advertised yet.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repo showcasing your projects, especially those related to AWS and Kubernetes. This gives potential employers a taste of what you can do.
✨Tip Number 3
Prepare for interviews by practising common SRE scenarios. Think about how you’d handle incidents or improve system reliability. We want to see your problem-solving skills in action!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are genuinely interested in joining us.
We think you need these skills to ace Principal Site Reliability Engineer in London
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the Principal Site Reliability Engineer role. Highlight your hands-on expertise with Kubernetes and AWS, and don’t forget to mention any SRE transformations you've led!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to tell us why you're passionate about site reliability engineering and how your background makes you a perfect fit for our team. Be sure to mention specific projects or achievements that showcase your skills.
Showcase Your Technical Skills: In your application, be clear about your technical proficiencies, especially in Infrastructure as Code and observability tools. We want to see how you’ve used these skills in real-world scenarios, so don’t hold back on the details!
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows us you’re keen on joining our team at Orgvue!
How to prepare for a job interview at Orgvue Limited
✨Know Your Tech Inside Out
Make sure you’re well-versed in AWS and Kubernetes, as these are crucial for the role. Brush up on your experience with specific services like EC2, EKS, and RDS, and be ready to discuss how you've used them in production environments.
✨Showcase Your SRE Experience
Prepare to talk about your past experiences leading SRE transformations. Highlight specific projects where you defined SLOs, implemented observability metrics, or improved incident response processes. Real-world examples will make your case stronger.
✨Demonstrate Your Problem-Solving Skills
Be ready to discuss how you approach incident management and disaster recovery planning. Share stories of past incidents, what you learned, and how you’ve evolved processes to prevent future issues. This shows your proactive mindset.
✨Cultural Fit Matters
Orgvue values individualism and diversity, so be yourself! Show enthusiasm for their mission and how you can contribute to a positive team culture. Ask questions about their practices and values to demonstrate your interest in fitting in.