Site Reliability Engineer in London

Site Reliability Engineer in London

London Full-Time 60000 - 80000 £ / year (est.) Home office (partial)
Signal AI

At a Glance

  • Tasks: Join our team to enhance AI-driven infrastructure and ensure operational excellence.
  • Company: Signal AI, a forward-thinking company revolutionising decision-making with AI technology.
  • Benefits: Inclusive workplace, competitive salary, and opportunities for professional growth.
  • Other info: Diverse perspectives are valued; we encourage all passionate candidates to apply.
  • Why this job: Shape the future of AI operations and make a real impact in a collaborative environment.
  • Qualifications: Experience with AWS, Terraform, and programming in Python or Go.

The predicted salary is between 60000 - 80000 £ per year.

We're on a mission to change the way businesses make decisions with our cutting-edge AI technology. To achieve that, we’re looking for passionate people to join our open and inclusive workplace. Our inclusive environment welcomes skills and experiences from diverse backgrounds, and defines who we are.

We're hiring an SRE to help us run and evolve the infrastructure behind Signal AI's decision intelligence platform. You'd be joining a small, collaborative Infrastructure team at a moment when the work is genuinely changing shape. Over the last year we've hardened the platform, reduced cost, and built serious observability into our highest-volume systems. The next year is about scaling that work, absorbing infrastructure from a recent acquisition, and being thoughtful about how AI shows up in operational work: not as a gimmick, but as a tool we trust ourselves to use well.

We're looking for someone who wants to shape the direction of the team; someone who brings curiosity and care to the work, and who wants to leave things meaningfully better than they found them.

What we've shipped recently
  • Cut ~$50k/year off our Elasticsearch bill by migrating compute to more efficient chips. (Apr 2026)
  • Built the foundation for our MCP server platform: leveraging and contributing to open-source tooling to give the whole company extensible, production-grade AI integrations. (2025–2026)
  • Rebuilt production from scratch in a full DR gameday. End-to-end restore validated across our multi-account AWS setup. (Jan 2026)
What we're working on next
  • AI-augmented operations: Claude Enterprise is deployed across Signal. We want this team to help define what good looks like for SRE: incident triage, runbook generation, capacity planning, cost analysis. This is a strategic investment, not a side project: and we'd love someone genuinely curious about what these tools can and can't do.
  • Security in the age of AI: The threat landscape has shifted. Supply chain security is more at threat than ever, and powerful models are emerging that promise to change how the industry thinks about security. We're looking for someone interested in thinking seriously about what actually matters to protect now.
  • Acquisition integration: Bringing a recently acquired product's infrastructure under our reliability, security, and operational standards. A substantial, multi-quarter piece of work with real technical and organisational complexity, and plenty of room to make your mark.
  • Batch workload consolidation: Moving disparate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling.
Your first six months

We want to set you up to thrive. Here's what that looks like in practice:

  • Month 1: You're onboarded across our AWS estate, Terraform, and observability stack. You've completed your first on-call shift with support from the team, landed your first PR in the DevOps repo, and started working Claude Enterprise into your daily flow.
  • Month 3: You're owning a workstream end-to-end. You've led the SRE response to at least one production incident and hosted your first post-mortem. You’ve surfaced a real opportunity that you've pushed to a measurable result.
  • Month 6: You're driving a multi-quarter workstream with clear direction, and you're contributing insights to our AI-in-operations playbook: including where Claude adds real leverage and where it doesn't.
What we’re looking for

You have solid AWS and Terraform experience, and you're comfortable writing Python or Go to solve operational problems. You think in distributed systems: failure modes, observability, blast radius: and you take problems end-to-end rather than stopping at the edges of your own work.

You're pragmatic about AI tooling. Not evangelical, not dismissive. You can tell us when you'd reach for an LLM and when you wouldn't, and you'd have a clear reason either way.

You communicate openly and you're comfortable pushing back when you think something could be better. We want to leverage your experience and perspective to grow our platform.

We know not every strong candidate will have every skill on this list. If you're excited about the work and you're close on the experience, we'd encourage you to apply.

Nice to haves
  • Networking depth. You're comfortable below the load balancer: TCP/IP fundamentals, DNS, VPC design, and what actually happens when a service can't reach another one.
  • Operational security instincts. You follow the threat landscape with genuine interest: not just CVEs, but shifts in how attacks happen and how the industry is responding. You have a point of view on what actually matters right now.
  • Linux internals comfort. When something behaves strangely under load, you know where to look.
  • Communication across technical levels. You can collaborate with your infrastructure teammates and explain the same concepts clearly to a product manager. You've worked alongside colleagues with a wide range of technical backgrounds and adapted naturally.

Not sure you meet every requirement? Studies show that women and other underrepresented groups often hesitate to apply unless they check every box. At Signal AI, diverse perspectives strengthen our teams, drive innovation, and lead to better performance. So even if your background doesn’t align perfectly with each qualification, we encourage you to apply if you’re passionate about this role.

We're dedicated to creating an inclusive environment where every Signaller feels welcomed, valued, and heard—a place where you can truly thrive as yourself.

Site Reliability Engineer in London employer: Signal AI

At Signal AI, we pride ourselves on being an exceptional employer that fosters a collaborative and inclusive work culture, where diverse perspectives are not just welcomed but celebrated. As a Site Reliability Engineer, you'll have the opportunity to work with cutting-edge AI technology in a supportive environment that prioritises employee growth and innovation, ensuring you can make a meaningful impact while advancing your career. Our commitment to professional development, coupled with our strategic focus on AI-augmented operations, makes this an exciting time to join our team in shaping the future of decision intelligence.

Signal AI

Contact Details:

Signal AI Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer in London

Tip Number 1

Network like a pro! Reach out to current employees on LinkedIn or at industry events. Ask them about their experiences and the company culture. This not only shows your interest but can also give you insider info that might help you stand out.

Tip Number 2

Prepare for the interview by understanding the company's recent projects and challenges. Dive into their tech stack and think about how your skills can contribute. When you show that you’re already thinking about solutions, it’ll impress the hiring team!

Tip Number 3

Practice your problem-solving skills! You might face technical challenges during interviews, so brush up on your AWS, Terraform, and coding in Python or Go. The more comfortable you are, the better you’ll perform when it counts.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you’re genuinely interested in being part of our team. Let’s make this happen together!

We think you need these skills to ace Site Reliability Engineer in London

AWS
Terraform
Python
Go
Distributed Systems
Observability
Incident Response

Some tips for your application 🫡

Show Your Passion:When you're writing your application, let your enthusiasm for the role shine through! We want to see that you're genuinely excited about the work we're doing at Signal AI and how you can contribute to our mission.

Tailor Your Application:Make sure to customise your CV and cover letter to highlight relevant experience and skills that match the job description. We love seeing how your unique background can bring something special to our team!

Be Clear and Concise:Keep your application straightforward and to the point. Use clear language to explain your experiences and how they relate to the role of Site Reliability Engineer. We appreciate clarity and directness!

Apply Through Our Website:Don't forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

How to prepare for a job interview at Signal AI

Know Your Tech Stack

Make sure you’re well-versed in AWS, Terraform, and the programming languages mentioned, like Python or Go. Brush up on your knowledge of distributed systems and be ready to discuss how you’ve tackled operational problems in the past.

Show Your Curiosity

Demonstrate your genuine interest in AI-augmented operations and security. Prepare examples of how you've used AI tools effectively and when you think they might not be the best fit. This shows you’re pragmatic and thoughtful about technology.

Communicate Clearly

Practice explaining complex technical concepts in simple terms. You’ll need to collaborate with both technical and non-technical team members, so being able to adapt your communication style is key. Think of examples where you’ve successfully done this before.

Be Ready to Discuss Challenges

Prepare to talk about a time you faced a significant challenge in your previous roles, especially related to incident response or infrastructure integration. Highlight what you learned and how you improved processes as a result. This will show your problem-solving skills and ability to learn from experience.