Job Board

Companies

Albatross

Site Reliability Engineer

Site Reliability Engineer in London

London Full-Time 36000 - 60000 £ / year (est.) No home office possible

At a Glance

Tasks: Own the reliability of our platform and lead incident response.
Company: Join Albatross, a cutting-edge AI company transforming user experience.
Benefits: Enjoy remote work, autonomy, and a supportive team culture.
Why this job: Make a real impact on innovative technology in a dynamic environment.
Qualifications: 5-7 years in SRE or similar roles with strong Kubernetes experience.
Other info: Contribute to open-source projects and grow your career with us.

The predicted salary is between 36000 - 60000 £ per year.

Location: Remote, right to work and travel in Europe.

At Albatross, we are building the second pillar of AI: a perception layer that understands how users actually experience content, in real time. Trained on live user interactions, Albatross learns and reasons on the fly. Our technology powers real-time, in-session discovery by adapting to evolving user interests. We have raised significant funding and our platform already operates at scale, processing billions of events and serving hundreds of millions of predictions.

The Role

We are looking for a Site Reliability Engineer to own the reliability and observability of our platform. This is a hands-on leadership role where you will design, build, and maintain our observability stack, lead incident response, oversee releases, and establish the processes and standards that allow the team to ship quickly and confidently.

More specifically you will:

Observability & Monitoring: Own and evolve our observability stack (Prometheus, Grafana, Loki, Jaeger), including dashboards, alerts, and SLOs. Instrument services for meaningful metrics and tracing, reducing noise and improving signal.
Reliability & Incident Response: Lead incident response and establish blameless postmortems, runbooks, and automated remediation. Define, track, and improve SLIs/SLOs to proactively reduce reliability risk.
Release Management: Own the release process end-to-end, improving deployment speed, safety, and recovery. Implement progressive rollouts, feature flags, and rollback strategies.
Platform & Tooling: Embed observability into the development lifecycle in close collaboration with engineering. Maintain and evolve our Kubernetes-based platform, adopting new tools when they add real value.

Requirements

5–7+ years in SRE, platform engineering, DevOps, or similar roles.
Strong production experience with Kubernetes and modern observability stacks (Prometheus, Grafana, Loki, Jaeger/OpenTelemetry).
Proven track record leading incident response and building monitoring systems teams actually use.
Deep distributed systems knowledge and production debugging experience.
Pragmatic approach to tooling and alerting that teams trust.
Clear communicator across engineering, product, and leadership.
STEM degree (Computer Science, Engineering, Mathematics, or similar).
Plus: contributions to open-source observability projects and background in high-scale or high-availability environments.

Benefits

Remote-first, async-friendly culture.
Ownership and autonomy, you will shape how we do reliability.
A team that cares about building things right.

Site Reliability Engineer in London employer: Albatross

At Albatross, we pride ourselves on fostering a remote-first, asynchronous culture that empowers our employees to take ownership and shape the future of reliability in our innovative AI platform. With a strong focus on professional growth, we offer opportunities for meaningful contributions in a collaborative environment where your expertise in observability and incident response will be valued. Join us to work alongside a passionate team dedicated to building robust systems that truly understand user experiences, all while enjoying the flexibility of remote work across Europe.

Contact Detail:

Albatross Recruiting Team

View Albatross Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry on LinkedIn or at meetups. A friendly chat can lead to opportunities that aren’t even advertised yet.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing your projects, especially those related to observability and reliability. This gives potential employers a taste of what you can do.

✨Tip Number 3

Prepare for interviews by brushing up on your incident response strategies and monitoring tools. Be ready to discuss real-life scenarios where you’ve made a difference in reliability.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive!

We think you need these skills to ace Site Reliability Engineer in London

Observability Stack Management

Prometheus

Grafana

Loki

Jaeger

Incident Response

Blameless Postmortems

SLIs/SLOs Definition and Tracking

Release Management

Kubernetes

Production Debugging

Clear Communication

Tooling Pragmatism

Collaboration with Engineering Teams

Some tips for your application 🫡

Tailor Your CV: Make sure your CV reflects the skills and experiences that align with the Site Reliability Engineer role. Highlight your experience with Kubernetes and observability stacks like Prometheus and Grafana, as these are key to what we do at Albatross.

Craft a Compelling Cover Letter: Use your cover letter to tell us why you're passionate about reliability and observability. Share specific examples of how you've led incident responses or improved monitoring systems in your previous roles. We love hearing your story!

Showcase Your Technical Skills: Don’t shy away from getting technical! Include details about your hands-on experience with tools and technologies relevant to the role. This is your chance to show us you know your stuff and can hit the ground running.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows us you’re keen on joining our team!

How to prepare for a job interview at Albatross

✨Know Your Tools Inside Out

Make sure you’re well-versed in the observability stack mentioned in the job description, like Prometheus, Grafana, and Jaeger. Be ready to discuss how you've used these tools in past roles, including specific examples of dashboards you've created or metrics you've tracked.

✨Showcase Your Incident Response Skills

Prepare to talk about your experience leading incident responses. Share a story where you established a blameless postmortem or improved SLIs/SLOs. This will demonstrate your ability to handle pressure and improve reliability.

✨Communicate Clearly and Confidently

Since clear communication is key for this role, practice explaining complex technical concepts in simple terms. You might be asked to explain your approach to embedding observability into the development lifecycle, so make sure you can articulate your thoughts clearly.

✨Emphasise Your Pragmatic Approach

Be prepared to discuss your pragmatic approach to tooling and alerting. Share examples of how you’ve implemented solutions that teams trust and use regularly. This shows that you understand the balance between functionality and usability.

Site Reliability Engineer in London

Albatross

Location: London

Site Reliability Engineer in London

London

Full-Time

36000 - 60000 £ / year (est.)
Albatross

50-100

View Albatross Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now

Site Reliability Engineer in London

At a Glance

Site Reliability Engineer in London employer: Albatross

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Site Reliability Engineer in London

Some tips for your application 🫡

How to prepare for a job interview at Albatross

Site Reliability Engineer in London

Land your dream job quicker with Premium

Similar positions in other companies

UK’s top job board for Gen Z