Site Reliability Engineer in London

Job Board

Companies

United States Digital Space LLC

Site Reliability Engineer

Site Reliability Engineer in London

London Full-Time 50000 - 50000 £ / year (est.) Home office (partial)

Apply Now

At a Glance

Tasks: Join our team to enhance AI-driven decision-making and manage cutting-edge infrastructure.
Company: Dynamic tech company focused on innovation and inclusivity.
Benefits: Competitive salary, inclusive culture, and opportunities for professional growth.
Other info: Diverse perspectives are valued; apply even if you don't meet every requirement.
Why this job: Shape the future of AI operations and make a real impact in a collaborative environment.
Qualifications: Experience with AWS, Terraform, and programming in Python or Go.

The predicted salary is between 50000 - 50000 £ per year.

We're on a mission to change the way businesses make decisions with our cutting-edge AI technology. To achieve that, we’re looking for passionate people to join our open and inclusive workplace. Our inclusive environment welcomes skills and experiences from diverse backgrounds, and defines who we are.

We're hiring an SRE to help us run and evolve the infrastructure behind the company's decision intelligence platform. You'd be joining a small, collaborative Infrastructure team at a moment when the work is genuinely changing shape. Over the last year we've hardened the platform, reduced cost, and built serious observability into our highest-volume systems. The next year is about scaling that work, absorbing infrastructure from a recent acquisition, and being thoughtful about how AI shows up in operational work: not as a gimmick, but as a tool we trust ourselves to use well.

We're looking for someone who wants to shape the direction of the team; someone who brings curiosity and care to the work, and who wants to leave things meaningfully better than they found them.

What we've shipped recently:

Cut ~$50k/year off our Elasticsearch bill by migrating compute to more efficient chips. (Apr 2026)
Built the foundation for our MCP server platform: leveraging and contributing to open-source tooling to give the whole company extensible, production-grade AI integrations. (2025–2026)
Rebuilt production from scratch in a full DR gameday. End-to-end restore validated across our multi-account AWS setup. (Jan 2026)

What we're working on next:

AI-augmented operations: Claude Enterprise is deployed across Signal. We want this team to help define what good looks like for SRE: incident triage, runbook generation, capacity planning, cost analysis. This is a strategic investment, not a side project: and we'd love someone genuinely curious about what these tools can and can't do.
Security in the age of AI: The threat landscape has shifted. Supply chain security is more at threat than ever, and powerful models are emerging that promise to change how the industry thinks about security. We're looking for someone interested in thinking seriously about what actually matters to protect now.
Acquisition integration: Bringing a recently acquired product's infrastructure under our reliability, security, and operational standards. A substantial, multi-quarter piece of work with real technical and organisational complexity, and plenty of room to make your mark.
Batch workload consolidation: Moving disparate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling.

Your first six months:

We want to set you up to thrive. Here's what that looks like in practice:

Month 1: You're onboarded across our AWS estate, Terraform, and observability stack. You've completed your first on‑call shift with support from the team, landed your first PR in the DevOps repo, and started working Claude Enterprise into your daily flow.
Month 3: You're owning a workstream end-to-end. You've led the SRE response to at least one production incident and hosted your first post‑mortem. You’ve surfaced a real opportunity that you've pushed to a measurable result.
Month 6: You're driving a multi‑quarter workstream with clear direction, and you're contributing insights to our AI‑in‑operations playbook: including where Claude adds real leverage and where it doesn't.

What we’re looking for:

You have solid AWS and Terraform experience, and you're comfortable writing Python or Go to solve operational problems. You think in distributed systems: failure modes, observability, blast radius: and you take problems end‑to‑end rather than stopping at the edges of your own work. You're pragmatic about AI tooling. Not evangelical, not dismissive. You can tell us when you'd reach for an LLM and when you wouldn't, and you'd have a clear reason either way. You communicate openly and you're comfortable pushing back when you think something could be better. We want to leverage your experience and perspective to grow our platform. We know not every strong candidate will have every skill on this list. If you're excited about the work and you're close on the experience, we'd encourage you to apply.

Nice to haves:

Networking depth. You're comfortable below the load balancer: TCP/IP fundamentals, DNS, VPC design, and what actually happens when a service can't reach another one.
Operational security instincts. You follow the threat landscape with genuine interest: not just CVEs, but shifts in how attacks happen and how the industry is responding. You have a point of view on what actually matters right now.
Linux internals comfort. When something behaves strangely under load, you know where to look.
Communication across technical levels. You can collaborate with your infrastructure teammates and explain the same concepts clearly to a product manager. You've worked alongside colleagues with a wide range of technical backgrounds and adapted naturally.

Not sure you meet every requirement? Studies show that women and other under‑represented groups often hesitate to apply unless they check every box. At the company, diverse perspectives strengthen our teams, drive innovation, and lead to better performance. So even if your background doesn’t align perfectly with each qualification, we encourage you to apply if you’re passionate about this role. We're dedicated to creating an inclusive environment where every Signaller feels welcomed, valued, and heard—a place where you can truly thrive as yourself.

Compensation Range: £70K - £85K

Site Reliability Engineer in London employer: United States Digital Space LLC

Join a forward-thinking company that values diversity and innovation, where your contributions as a Site Reliability Engineer will directly impact the evolution of our cutting-edge AI technology. With a collaborative work culture and a commitment to employee growth, you'll have the opportunity to shape the future of our infrastructure while enjoying competitive compensation and a supportive environment that encourages curiosity and creativity.

Contact Details:

United States Digital Space LLC Recruitment Team

View United States Digital Space LLC profile

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer in London

✨Join Local Tech Meetups

Get out there and mingle with fellow developers by joining local tech meetups. It’s a fantastic way to meet people who might be working at United States Digital Space LLC or know someone who does. Plus, you can pick up some trendy tech skills and trends while you're at it!

✨Contribute to Open Source Projects

Show off your coding chops by jumping into open-source projects. Not only does this give you practical experience, but it also gets you noticed in the dev community. You'll create a killer portfolio that speaks volumes about your skills to United States Digital Space LLC.

✨Tap into Online Developer Communities

Don’t underestimate the power of online developer communities like GitHub, Stack Overflow, and even Reddit. Participate in discussions, share your projects, and build your visibility. We can often find opportunities through these channels that can lead to a full-time gig at companies like United States Digital Space LLC.

✨Explore Job Boards Specifically for Tech Roles

Keep your eyes peeled on job boards that focus on tech roles. Sites like TechCareers or Stack Overflow Jobs can often have listings for companies like United States Digital Space LLC that might not show up on broader job sites. Make it a habit to check these regularly, and don’t hesitate to apply directly through our website!

We think you need these skills to ace Site Reliability Engineer in London

AWS

Terraform

Python

Distributed Systems

Observability

Incident Response

Post-Mortem Analysis

AI Tooling

Networking Fundamentals

Operational Security

Linux Internals

Communication Skills

Some tips for your application 🫡

Show off your coding skills:When applying for a software engineering role, it's super important to showcase your coding skills. Make sure your CV includes your tech stack, any relevant programming languages you’re comfortable with, and examples of projects you've worked on. If you have a GitHub profile, link it up! We love to see code in action.

Tailor your portfolio:For a full-time role, we’d expect to see some solid examples of your work in your portfolio. Make sure to include at least two or three projects that highlight your problem-solving skills and your ability to work with different technologies. Focus on the projects that are most relevant to the position at United States Digital Space LLC.

Craft a killer cover letter:Your cover letter is your chance to stand out—make it personal! Explain why you want to work at United States Digital Space LLC and how your skills align with the role. Show us your passion for software development. We dig enthusiastic candidates who understand the value of collaboration and continuous learning!

Be clear and concise:When it comes to writing your CV and cover letter, clarity is key. Avoid jargon that could confuse us and stick to simple, direct language. Highlight your achievements with quantifiable results where possible, and keep everything easy to read. A well-organised application goes a long way!

How to prepare for a job interview at United States Digital Space LLC

✨Brush Up on Your Coding Skills

For a full-time software engineering role, it's crucial that we stay sharp with our coding abilities. Expect technical questions that might involve solving problems on the spot or discussing algorithms. Practise on platforms like LeetCode or HackerRank to get comfortable with the types of questions that often come up.

✨Know Your Tools and Frameworks

Make sure we’re well-acquainted with the tools and technologies listed in the job description. Familiarise ourselves with any specific frameworks or programming languages mentioned. If United States Digital Space LLC uses React or Node.js, for instance, be ready to discuss how we’ve used them in previous projects or coursework.

✨Showcase Your Projects

Bring along a portfolio that highlights our best work. This could be code samples, GitHub repositories, or any side projects we’ve built. Make sure we can talk through our thought process for each project, especially the challenges we faced and how we solved them—this shows our problem-solving skills in action.

✨Prepare for Behavioural Questions

While technical skills are key, full-time positions also require cultural fit. Be ready to discuss our previous experiences and how we handle teamwork, conflict, and deadlines. Brush up on the STAR method—Situation, Task, Action, Result—to clearly articulate our past experiences when discussing how we've contributed to a team.

Site Reliability Engineer in London

United States Digital Space LLC

Location: London

Apply Now

Site Reliability Engineer in London

At a Glance

Site Reliability Engineer in London employer: United States Digital Space LLC

StudySmarter Expert Advice🤫

We think you need these skills to ace Site Reliability Engineer in London

Some tips for your application 🫡

How to prepare for a job interview at United States Digital Space LLC

Company

Product

Help