At a Glance
- Tasks: Join us as a Senior Site Reliability Engineer, enhancing our SRE practices and system reliability.
- Company: We're loveholidays, a fast-growing online travel agency making dream holidays a reality.
- Benefits: Enjoy a 5% pension contribution, training budget, discounted holidays, and up to 30 days off.
- Why this job: Be part of a tech-driven culture that values open source and innovation in travel.
- Qualifications: Experience in SRE practices, performance testing, and observability tools is essential.
- Other info: Work with cutting-edge cloud technologies and contribute to a vibrant engineering community.
The predicted salary is between 43200 - 72000 £ per year.
We are a rapidly growing online travel agency with technology at the heart of our success. In 2022, we sent millions of people on their dream holiday. With a million visitors a day, our 100+ services handle 8k requests per second, while maintaining p95 search latency of 150ms. Our observability captures and processes 1TB of logs a day and 350k metric samples a second. We focus on differentiation by relying heavily on open source, while also giving back through contributions to public repositories, open sourcing in-house tools and sponsoring conferences.
Responsibilities
- Contribute to the evolution of SRE practices like incident management, blameless postmortems, SLOs and error budgets.
- Build reliable, performant, auto-scalable and highly available systems.
- Support the existing Platform Infrastructure team.
- Level up SRE practices across the teams.
- Improve reliability KPIs of the platform.
- Help balance reliability with feature delivery using SLOs and error budgets.
Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not to run their services for them.
What you’ll be working on
- Exposing slow running code paths in critical applications using tools like Java Flight Recorder or Go’s pprof.
- Writing tools or modifying existing applications with reliability and performance in mind.
- Ensuring our systems and their individual components can withstand x10 load by improving our performance testing.
- Shortening mean time to discovery and recovery with improvements to observability and alerting.
We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the Mimir (Prometheus), Grafana, Loki, Tempo ecosystem. Our service mesh (Linkerd) provides uniform observability of all production services at 10s intervals. Performance and scalability are integral to our software and infrastructure development process, achieved by combining Computer Science fundamentals and cutting edge cloud technologies. Low-level debugging and troubleshooting.
What we’ll give back to you
- Company pension contributions at 5%.
- Training budget for you to learn on the job and level yourself up.
- Discounted holidays for you, your family and friends.
- 25 days of holidays per annum (plus 8 public holidays) increases by 1 day for every second year of service, up to a maximum 30 days per annum.
- Ability to buy and sell annual leave.
- Cycle to work scheme, season ticket loan and eye care vouchers.
About the company: loveholidays offer a bespoke way of searching for your next getaway, giving you the chance to personalise your holiday with the ultimate flexibility. Plus, book confidently knowing your holiday is ATOL protected.
Senior Site Reliability Engineer employer: loveholidays
Contact Detail:
loveholidays Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Site Reliability Engineer
✨Tip Number 1
Familiarise yourself with the tools and technologies mentioned in the job description, such as Java Flight Recorder, Go’s pprof, and the Mimir ecosystem. Having hands-on experience or projects showcasing your skills with these tools can set you apart during discussions.
✨Tip Number 2
Engage with the open-source community related to the technologies we use. Contributing to public repositories or participating in discussions can demonstrate your commitment to the field and help you network with professionals who might influence hiring decisions.
✨Tip Number 3
Prepare to discuss your experience with incident management and blameless postmortems. Be ready to share specific examples of how you've implemented SLOs and error budgets in previous roles, as this aligns closely with our expectations for the Senior Site Reliability Engineer position.
✨Tip Number 4
Showcase your understanding of balancing reliability with feature delivery. Think of scenarios where you've had to make trade-offs between performance and new features, and be prepared to discuss how you approached those challenges.
We think you need these skills to ace Senior Site Reliability Engineer
Some tips for your application 🫡
Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the Senior Site Reliability Engineer position. Familiarise yourself with SRE practices, observability tools, and performance testing methodologies mentioned in the job description.
Tailor Your CV: Customise your CV to highlight relevant experience and skills that align with the job description. Emphasise your knowledge of open source technologies, incident management, and any previous work with performance optimisation or reliability engineering.
Craft a Compelling Cover Letter: Write a cover letter that showcases your passion for technology and travel. Discuss how your background in SRE can contribute to the company's goals, particularly in improving reliability KPIs and enhancing observability.
Showcase Relevant Projects: If you have worked on projects involving observability stacks like Prometheus, Grafana, or similar tools, be sure to mention these in your application. Providing specific examples of how you've improved system performance or reliability will strengthen your application.
How to prepare for a job interview at loveholidays
✨Understand the SRE Practices
Familiarise yourself with key SRE concepts such as incident management, blameless postmortems, SLOs, and error budgets. Be prepared to discuss how you have implemented or improved these practices in your previous roles.
✨Showcase Your Technical Skills
Be ready to demonstrate your expertise in performance testing and observability tools like Prometheus, Grafana, and others mentioned in the job description. Prepare examples of how you've used these tools to enhance system reliability.
✨Discuss Load Handling Strategies
Since the role involves ensuring systems can withstand increased loads, be prepared to talk about your experience with load testing and scaling applications. Share specific instances where you successfully improved system performance under high demand.
✨Emphasise Collaboration
Highlight your ability to work with engineering teams to improve operations without taking over their services. Discuss how you’ve supported teams in achieving their goals while maintaining a focus on reliability and performance.