At a Glance
- Tasks: Enhance reliability and resilience of digital platforms through SRE practices.
- Company: Join a leading digital organisation in London with a hybrid work model.
- Benefits: Competitive salary, 25% bonus, excellent benefits, and career growth opportunities.
- Other info: Collaborate closely with product and engineering teams to drive innovation.
- Why this job: Make a real impact on customer experience and service reliability in a dynamic environment.
- Qualifications: Experience as an SRE with strong API and microservices knowledge.
The predicted salary is between 72000 - 84000 £ per year.
Location: London (Hybrid)
Salary: £100,000 per annum + 25% Bonus + Excellent Benefits
We are hiring a Senior SRE to support a large-scale digital organisation undergoing a major commercial re-platforming across web and mobile channels. This role sits much closer to the application layer than traditional infrastructure SRE positions. You will work directly with product and engineering teams across customer-facing platforms (web, mobile, payment journeys, APIs) to improve reliability, resilience, and service behaviour in production. This is not a ticket-driven operational role and not a pure platform engineering post. It is about embedding measurable reliability into distributed systems at service level.
What You’ll Be Doing
- Embed SRE practices across API and microservices-based architectures
- Define and own meaningful SLIs/SLOs aligned to customer journeys and business-critical flows
- Improve service reliability through proactive observability, tracing, telemetry and alert tuning
- Partner closely with backend and platform engineers to reduce systemic failure modes
- Lead and contribute to incident response, post-incident reviews and resilience improvements
- Move the organisation from symptom-based alerting to customer-impact driven diagnostics
- Contribute to release safety, progressive deployments and production guardrails
What We’re Looking For
- Proven experience operating as an SRE within digital product environments
- Strong understanding of API architectures, microservices and distributed systems behaviour
- Hands-on experience defining and implementing SLIs, SLOs and error budgets
- Deep observability exposure (e.g. Datadog, Splunk, Prometheus, tracing/APM platforms)
- Experience working closely with application engineering teams, not just infrastructure teams
- Background in high-availability, customer-facing systems where outages have commercial impact
- Cloud-native exposure (AWS preferred) with practical understanding of Kubernetes environments
Important
This role is best suited to engineers who care deeply about production behaviour, customer experience in failure scenarios, and reliability as a first-class product feature, rather than engineers focused purely on infrastructure provisioning or CI/CD enablement.
Please get in touch with Benjamin Applewhaite to discuss the role in confidence.
Senior Site Reliability Engineer (Application / API Focused) employer: Xpertise Recruitment
Join a forward-thinking digital organisation in London that prioritises innovation and employee growth. As a Senior Site Reliability Engineer, you will thrive in a collaborative hybrid work environment, where your contributions directly enhance customer experiences across web and mobile platforms. With competitive compensation, a generous bonus structure, and a commitment to professional development, this company stands out as an exceptional employer for those seeking meaningful and rewarding careers.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Site Reliability Engineer (Application / API Focused)
✨Tip Number 1
Network like a pro! Reach out to your connections in the industry, especially those who work in SRE roles. A friendly chat can lead to insider info about job openings or even referrals that could give you a leg up.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to API architectures and microservices. This gives potential employers a tangible look at what you can do beyond just your CV.
✨Tip Number 3
Prepare for interviews by brushing up on your knowledge of SLIs, SLOs, and observability tools. Be ready to discuss how you've implemented these in past roles, as this will show you're not just familiar with the concepts but have real-world experience.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at StudySmarter.
We think you need these skills to ace Senior Site Reliability Engineer (Application / API Focused)
Some tips for your application 🫡
Tailor Your CV:Make sure your CV reflects the skills and experiences that align with the Senior SRE role. Highlight your experience with API architectures and microservices, as well as any hands-on work with observability tools like Datadog or Prometheus.
Craft a Compelling Cover Letter:Use your cover letter to tell us why you’re passionate about reliability and customer experience. Share specific examples of how you've improved service reliability in past roles, and don’t forget to mention your collaborative work with engineering teams.
Showcase Your Problem-Solving Skills:In your application, emphasise your ability to tackle complex issues. We want to see how you've led incident responses or contributed to post-incident reviews, so include those experiences to demonstrate your proactive approach.
Apply Through Our Website:We encourage you to apply directly through our website for the best chance of getting noticed. It’s the easiest way for us to keep track of your application and ensure it reaches the right people!
How to prepare for a job interview at Xpertise Recruitment
✨Know Your SRE Fundamentals
Make sure you brush up on your Site Reliability Engineering principles, especially those related to API and microservices. Be ready to discuss how you've embedded SRE practices in previous roles and how you define SLIs and SLOs that align with customer journeys.
✨Showcase Your Technical Skills
Prepare to talk about your hands-on experience with observability tools like Datadog or Prometheus. Have specific examples ready where you've improved service reliability through proactive monitoring and alert tuning, as this will demonstrate your technical prowess.
✨Emphasise Collaboration
This role requires close partnership with product and engineering teams. Be prepared to share examples of how you've worked collaboratively with backend engineers to reduce systemic failures and improve production behaviour. Highlight your communication skills and teamwork.
✨Focus on Customer Impact
Since the role is about embedding reliability as a product feature, think about how you've previously moved from symptom-based alerting to customer-impact driven diagnostics. Discuss any experiences where your actions directly improved customer experience during failure scenarios.