At a Glance
- Tasks: Join our team to enhance AI-driven decision-making and manage cutting-edge infrastructure.
- Company: Signal AI, a forward-thinking tech company focused on innovation and inclusivity.
- Benefits: Competitive salary, inclusive culture, and opportunities for professional growth.
- Other info: Diverse perspectives are valued; we encourage all passionate candidates to apply.
- Why this job: Shape the future of AI operations and make a real impact in a collaborative environment.
- Qualifications: Experience with AWS, Terraform, and programming in Python or Go.
The predicted salary is between 60000 - 80000 £ per year.
We're on a mission to change the way businesses make decisions with our cutting-edge AI technology. To achieve that, we’re looking for passionate people to join our open and inclusive workplace. Our inclusive environment welcomes skills and experiences from diverse backgrounds, and defines who we are. We're hiring an SRE to help us run and evolve the infrastructure behind Signal AI's decision intelligence platform. You'd be joining a small, collaborative Infrastructure team at a moment when the work is genuinely changing shape.
Over the last year we've hardened the platform, reduced cost, and built serious observability into our highest-volume systems. The next year is about scaling that work, absorbing infrastructure from a recent acquisition, and being thoughtful about how AI shows up in operational work: not as a gimmick, but as a tool we trust ourselves to use well. We're looking for someone who wants to shape the direction of the team; someone who brings curiosity and care to the work, and who wants to leave things meaningfully better than they found them.
What we've shipped recently:
- Cut ~$50k/year off our Elasticsearch bill by migrating compute to more efficient chips. (Apr 2026)
- Built the foundation for our MCP server platform: leveraging and contributing to open-source tooling to give the whole company extensible, production-grade AI integrations. (2025–2026)
- Rebuilt production from scratch in a full DR gameday. End-to-end restore validated across our multi-account AWS setup. (Jan 2026)
What we're working on next:
- AI-augmented operations: Claude Enterprise is deployed across Signal. We want this team to help define what good looks like for SRE: incident triage, runbook generation, capacity planning, cost analysis. This is a strategic investment, not a side project: and we'd love someone genuinely curious about what these tools can and can't do.
- Security in the age of AI: The threat landscape has shifted. Supply chain security is more at threat than ever, and powerful models are emerging that promise to change how the industry thinks about security. We're looking for someone interested in thinking seriously about what actually matters to protect now.
- Acquisition integration: Bringing a recently acquired product's infrastructure under our reliability, security, and operational standards. A substantial, multi-quarter piece of work with real technical and organisational complexity, and plenty of room to make your mark.
- Batch workload consolidation: Moving disparate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling.
Your first six months:
- Month 1: You're onboarded across our AWS estate, Terraform, and observability stack. You've completed your first on-call shift with support from the team, landed your first PR in the DevOps repo, and started working Claude Enterprise into your daily flow.
- Month 3: You're owning a workstream end-to-end. You've led the SRE response to at least one production incident and hosted your first post-mortem. You’ve surfaced a real opportunity that you've pushed to a measurable result.
- Month 6: You're driving a multi-quarter workstream with clear direction, and you're contributing insights to our AI-in-operations playbook: including where Claude adds real leverage and where it doesn't.
What we’re looking for:
- You have solid AWS and Terraform experience, and you're comfortable writing Python or Go to solve operational problems.
- You think in distributed systems: failure modes, observability, blast radius: and you take problems end-to-end rather than stopping at the edges of your own work.
- You're pragmatic about AI tooling. Not evangelical, not dismissive. You can tell us when you'd reach for an LLM and when you won't, and you'd have a clear reason either way.
- You communicate openly and you're comfortable pushing back when you think something could be better. We want to leverage your experience and perspective to grow our platform.
We know not every strong candidate will have every skill on this list. If you're excited about the work and you're close on the experience, we'd encourage you to apply.
Nice to haves:
- Networking depth. You're comfortable below the load balancer: TCP/IP fundamentals, DNS, VPC design, and what actually happens when a service can't reach another one.
- Operational security instincts. You follow the threat landscape with genuine interest: not just CVEs, but shifts in how attacks happen and how the industry is responding. You have a point of view on what actually matters right now.
- Linux internals comfort. When something behaves strangely under load, you know where to look.
- Communication across technical levels. You can collaborate with your infrastructure teammates and explain the same concepts clearly to a product manager. You've worked alongside colleagues with a wide range of technical backgrounds and adapted naturally.
Not sure you meet every requirement? Studies show that women and other underrepresented groups often hesitate to apply unless they check every box. At Signal AI, diverse perspectives strengthen our teams, drive innovation, and lead to better performance. So even if your background doesn’t align perfectly with each qualification, we encourage you to apply if you’re passionate about this role. We're dedicated to creating an inclusive environment where every Signaller feels welcomed, valued, and heard—a place where you can truly thrive as yourself.
Site Reliability Engineer employer: Signal AI
At Signal AI, we pride ourselves on fostering an open and inclusive workplace that values diverse perspectives and experiences. As a Site Reliability Engineer, you'll be part of a collaborative team dedicated to innovative AI solutions, with ample opportunities for personal and professional growth. Our commitment to employee development, coupled with a culture that encourages curiosity and meaningful contributions, makes us an exceptional employer in the tech industry.
StudySmarter Expert Advice🤫
We think this is how you could land Site Reliability Engineer
✨Tip Number 1
Network like a pro! Reach out to current employees on LinkedIn or at industry events. Ask them about their experiences and the company culture. This not only gives you insider info but also shows your genuine interest in the role.
✨Tip Number 2
Prepare for the interview by diving deep into the company's recent projects and tech stack. Be ready to discuss how your skills can contribute to their goals, especially around AI and infrastructure. Show them you’re not just another candidate!
✨Tip Number 3
Practice your problem-solving skills! You might face technical challenges during interviews, so brush up on your AWS, Terraform, and coding skills. Think through distributed systems scenarios and be ready to share your thought process.
✨Tip Number 4
Don’t forget to follow up after your interview! A simple thank-you email can go a long way. Mention something specific from your conversation to remind them of your enthusiasm and fit for the role. And remember, apply through our website for the best chance!
We think you need these skills to ace Site Reliability Engineer
Some tips for your application 🫡
Show Your Passion:When you're writing your application, let your enthusiasm for the role shine through! We want to see that you're genuinely excited about the work we're doing at Signal AI and how you can contribute to our mission.
Tailor Your Experience:Make sure to highlight your relevant skills and experiences that align with the job description. Whether it's your AWS expertise or your knack for Python, we want to know how your background makes you a great fit for our team.
Be Yourself:Don’t be afraid to let your personality come through in your application. We value authenticity and want to get to know the real you, so share your unique perspective and experiences that have shaped your approach to SRE.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re serious about joining our team!
How to prepare for a job interview at Signal AI
✨Know Your Tech Stack
Make sure you’re well-versed in AWS, Terraform, and the programming languages mentioned, like Python or Go. Brush up on distributed systems concepts and be ready to discuss how you’ve tackled operational problems in the past.
✨Show Your Curiosity
Demonstrate your genuine interest in AI and operational security. Be prepared to discuss how you would approach integrating AI tools into SRE practices and share your thoughts on what works and what doesn’t.
✨Prepare for Real-World Scenarios
Think about specific incidents you've managed or contributed to in previous roles. Be ready to explain your thought process during these situations, especially around incident triage and post-mortems.
✨Communicate Clearly
Practice explaining complex technical concepts in simple terms. You’ll need to collaborate with various teams, so being able to adapt your communication style is key. Show that you can bridge the gap between technical and non-technical stakeholders.