Job Board

Companies

loveholidays

Lead Site Reliability Engineer

London Full-Time 60000 - 84000 £ / year (est.) No home office possible

At a Glance

Tasks: Lead the evolution of SRE practices and enhance system reliability.
Company: Join loveholidays, a fast-growing online travel agency revolutionising holiday bookings.
Benefits: Enjoy 25 days annual leave, discounted holidays, and a training budget for personal growth.
Why this job: Be at the forefront of tech innovation in travel, impacting millions of dream holidays.
Qualifications: Experience in SRE practices and a passion for performance and reliability are essential.
Other info: Opportunity to work with cutting-edge technologies and contribute to open source projects.

The predicted salary is between 60000 - 84000 £ per year.

We are a rapidly growing online travel agency with technology at the core of our success. In 2022, we facilitated millions of people on their dream holidays. Handling a million visitors daily, our platform supports over 100 services, processing 8,000 requests per second, with a p95 search latency of 150ms. Our observability infrastructure captures and processes 1TB of logs daily and 350,000 metric samples per second. We emphasize differentiation through open source contributions, including open sourcing internal tools, contributing to public repositories, and sponsoring conferences.

Responsibilities

As our first Site Reliability Engineer, you will help evolve SRE practices such as incident management, blameless postmortems, SLOs, and error budgets.
Your role will involve building reliable, performant, auto-scalable, and highly available systems with support from the existing Platform Infrastructure team.
Enhance SRE practices across teams.
Improve reliability KPIs of the platform.
Balance reliability with feature delivery using SLOs and error budgets.

Our engineering teams manage the entire lifecycle of services from initial development to high-load production operation. Your responsibility is to enable engineering teams to succeed in operations, not to run their services for them.

What you’ll be working on

Kick-start our SRE function by promoting reliability best practices and processes.
Identify slow code paths in critical applications using tools like Java Flight Recorder or Go’s pprof.
Develop or modify tools and applications with reliability and performance in mind.
Ensure systems can handle ten times the current load by improving performance testing.
Reduce mean time to discovery and recovery through enhanced observability and alerting.

We focus heavily on observability, continuously evolving our monitoring and alerting stack centered around the Mimir ecosystem (Prometheus, Grafana, Loki, Tempo). Our service mesh (Linkerd) provides uniform observability of all production services at 10-second intervals. Performance and scalability are fundamental to our development process, achieved by combining core computer science principles with cutting-edge cloud technologies. Perform low-level debugging and troubleshooting.

What we’ll give back to you

Company pension contributions at 5%.
Training budget to support your ongoing learning and development.
Discounted holidays for you, your family, and friends.
25 days of annual leave plus 8 public holidays, increasing by 1 day every two years up to 30 days.
Option to buy or sell annual leave.
Cycle-to-work scheme, season ticket loans, and eye care vouchers.

About the company

loveholidays offers a personalized approach to searching for your next getaway, allowing you to customize your holiday with maximum flexibility. Rest assured, your holiday is ATOL protected. We offer various payment options to ensure a seamless booking experience.

Lead Site Reliability Engineer employer: loveholidays

At loveholidays, we pride ourselves on being an exceptional employer, fostering a vibrant work culture that champions innovation and collaboration. As a Lead Site Reliability Engineer, you will not only play a pivotal role in shaping our SRE practices but also benefit from generous training budgets, competitive pension contributions, and a flexible holiday policy that rewards your commitment. Join us in our dynamic London office, where your contributions will directly impact millions of travellers while enjoying discounted holidays and a supportive environment for personal and professional growth.

Contact Detail:

loveholidays Recruiting Team

View loveholidays Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Lead Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with the specific tools and technologies mentioned in the job description, such as Prometheus, Grafana, and Java Flight Recorder. Having hands-on experience or projects showcasing your skills with these tools can set you apart from other candidates.

✨Tip Number 2

Engage with the open-source community by contributing to relevant projects or sharing your own tools. This aligns with our emphasis on open source contributions and demonstrates your commitment to the principles we value at StudySmarter.

✨Tip Number 3

Prepare to discuss your experience with incident management and blameless postmortems during the interview. We value candidates who can articulate their approach to improving reliability and learning from failures.

✨Tip Number 4

Showcase your understanding of SLOs and error budgets in your discussions. Being able to explain how you would balance reliability with feature delivery will demonstrate that you grasp the core responsibilities of the role.

We think you need these skills to ace Lead Site Reliability Engineer

Site Reliability Engineering (SRE)

Incident Management

Blameless Postmortems

Service Level Objectives (SLOs)

Error Budgets

Performance Testing

Observability Tools (Prometheus, Grafana, Loki, Tempo)

Java Flight Recorder

Go’s pprof

Cloud Technologies

Low-Level Debugging

Troubleshooting Skills

Monitoring and Alerting

Scalability Principles

Collaboration with Engineering Teams

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience in site reliability engineering, particularly any work with observability tools like Prometheus or Grafana. Emphasise your contributions to open source projects and any experience with incident management.

Craft a Compelling Cover Letter: In your cover letter, express your passion for SRE practices and how you can contribute to the company's goals. Mention specific examples of how you've improved system reliability or performance in previous roles.

Showcase Technical Skills: Clearly outline your technical skills related to performance testing, debugging, and cloud technologies. Include any certifications or training that demonstrate your expertise in these areas.

Highlight Team Collaboration: Since the role involves working closely with engineering teams, provide examples of how you've successfully collaborated with others to enhance system reliability and performance. This will show your ability to enable teams rather than just manage services.

How to prepare for a job interview at loveholidays

✨Understand the SRE Principles

Familiarise yourself with key Site Reliability Engineering concepts such as incident management, blameless postmortems, SLOs, and error budgets. Be prepared to discuss how you would implement these practices in a real-world scenario.

✨Showcase Your Technical Skills

Be ready to demonstrate your proficiency with tools like Java Flight Recorder or Go’s pprof. Discuss your experience in performance testing and how you've improved system reliability in previous roles.

✨Emphasise Observability Experience

Since the company focuses heavily on observability, highlight any experience you have with monitoring and alerting stacks, particularly with tools like Prometheus, Grafana, and Loki. Share specific examples of how you've enhanced observability in past projects.

✨Prepare for Problem-Solving Questions

Expect to face technical challenges during the interview. Prepare to walk through your thought process in debugging and troubleshooting scenarios, showcasing your analytical skills and ability to think critically under pressure.

Lead Site Reliability Engineer

loveholidays

Location: London

Lead Site Reliability Engineer

London

Full-Time

60000 - 84000 £ / year (est.)
loveholidays

200-500

View loveholidays Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now

Lead Site Reliability Engineer

At a Glance

Lead Site Reliability Engineer employer: loveholidays

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Lead Site Reliability Engineer

Some tips for your application 🫡

How to prepare for a job interview at loveholidays

Lead Site Reliability Engineer

Land your dream job quicker with Premium

Similar positions in other companies

UK’s top job board for Gen Z