At a Glance
- Tasks: Keep our platform fast and secure while scaling to millions of executions.
- Company: Join a cutting-edge developer platform focused on AI agents and workflows.
- Benefits: Enjoy competitive pay, flexible remote work, generous vacation, and a training budget.
- Other info: Be part of a small, autonomous team that values continuous learning and community contributions.
- Why this job: Make a real impact in a fast-paced startup environment with a focus on open source.
- Qualifications: Experience in distributed systems, observability, and cloud-native technologies is essential.
The predicted salary is between 70000 - 90000 £ per year.
About Trigger.dev
Trigger.dev is a developer platform for building and running AI agents and workflows. We provide everything needed to create production-grade agents: an SDK, deploying, scaling, monitoring, and debugging them without needing to manage any infrastructure. Our Cloud product is a managed service where we deploy our users' code and auto-scale from zero to millions of executions. Today, we serve thousands of teams building AI apps and agents, handling hundreds of millions of executions per month.
About the position
We're hiring a Senior Site Reliability Engineer to keep Trigger.dev fast, observable and hard to break as we scale. You'll work across our open source codebase and the Cloud product that runs it in production. We're handling hundreds of millions of executions a month on infrastructure we run ourselves, and the next order of magnitude needs someone who thinks in distributed systems and treats observability and security as part of the product, not bolted on later.
Day to day you'll be chasing bottlenecks, hardening services like the sandbox runtime that executes untrusted user code, and making the platform legible to the engineers running it at 3am.
What you'll be doing
- Owning observability across the platform. Extending our OpenTelemetry instrumentation, sanding down noisy signals, and making metrics, logs and traces something engineers actually reach for during incidents.
- Designing and operating the distributed systems primitives we lean on (queues, schedulers, checkpoints, idempotency, backpressure) under real production load.
- Architecting and tuning the auto-scaling infrastructure that runs untrusted customer code at high throughput.
- Hunting bottlenecks across the stack, from Postgres query plans and Redis hot keys down to kernel, cgroup and network behaviour.
- Hardening the security posture of our multi-tenant runtime: sandbox isolation, secrets handling, network policy, supply chain.
- Owning Terraform and IaC as the source of truth for our cloud-native footprint, rather than an afterthought.
- Working on runtime internals: CPU/RAM snapshotting, cold-start optimization, live migration between hosts, resilient distributed file storage.
- Designing and running our on-call practice: runbooks, SLOs, blameless postmortems, paging hygiene.
- Making the rest of engineering faster and safer by keeping the platform easy to reason about.
- Contributing to architectural decisions and the technical roadmap.
We build in public, so you can see some of the things you might work on here.
Working at a Commercial Open Source Software company is more than just coding:
- We have an active community on Discord and GitHub. Everyone on the team helps customers, reviews PRs, and creates issues.
- Having great documentation is essential. Everyone writes docs.
- We're a product-led growth company, so everyone is expected to get involved in creating content like code examples, blog articles, videos, and tweets.
Requirements
You really need to have:
- Strong observability chops. Production experience with OpenTelemetry, Prometheus or equivalent, and opinions about cardinality, sampling and signal-to-noise.
- Distributed systems experience. You've designed or operated systems with non-trivial failure modes (queues, consensus, replication, idempotency).
- Cloud-native fluency. Comfortable in the CNCF orbit (Kubernetes, OTel, Argo, Crossplane, eBPF) rather than wedded to any one tool.
- Self-managed Kubernetes in production, not just clicking around managed control planes.
- Performance and scaling debugging instincts. You've chased real bottlenecks across application, database and infrastructure layers.
- Terraform fandom. IaC as a first principle, with experience running it at meaningful repo and team scale.
- Security mindset. Multi-tenant isolation, least privilege, secrets management, threat modelling.
- Expertise with Postgres and Redis under load.
- Experience with Go.
- Familiarity with Linux.
- Cloud infrastructure experience. AWS strongly preferred, GCP/Azure considered.
- OK with being on call and understanding reliability is a shared responsibility for the engineering team.
You'll be an amazing fit if you have:
- Experience running container orchestration at scale (cold starts, pod density, scheduler tuning, IP exhaustion, image pull optimization).
- Worked with MicroVMs (Firecracker, gVisor) or other sandbox runtimes for executing untrusted code.
- Demonstrated system scaling and performance optimization on high-throughput platforms.
- A proven track record of contributing to open source projects, especially in the observability or cloud-native ecosystem (OTel collector, Prometheus, Grafana stack, k8s operators).
- Expertise in Node.js and TypeScript (we use Remix), enough to land changes in our application code, not just the infrastructure around it.
- Experience with React, or better still, Remix.
- Designed SDKs for developers.
- Worked at a developer tools company or commercial open source company.
- You've previously been a venture-backed startup founder.
About the team
We are a small team with a flat hierarchy. We encourage continuous learning and personal development. Everyone operates autonomously; we trust you to get the job done, but to also quickly raise your hand if you need help. We care about our customers and the open-source community. Being a start-up, the work is challenging and fast-paced, but we also understand the importance of taking time off to recharge.
Salary
We use PostHog's salary calculator to benchmark fair and transparent compensation which varies based on employee location and level of experience (this role falls under "Backend engineer").
Benefits
- Generous, transparent compensation and equity - We hire the best talent and pay to reflect that. We also offer equity as a way to ensure everyone is invested in the success of the company.
- Async working - Need a heads-down day or meeting-free days to stay productive? No problem!
- Home office - We will help provide equipment for a comfortable setup so you're as productive at home as you are in the office.
- Generous vacation - We believe it's important to take time off. We encourage you to take your 25 days vacation excluding national holidays, plus sick leave and generous parental leave.
- Training budget - An annual budget to contribute towards learning such as purchasing books, an Audible subscription or online courses.
- Pension and 401k contributions - Enroll in our company pension scheme or we'll contribute directly to your private pension.
Our values
- We are proud to be open source - We believe in the open source community and building a great free-to-use product. We encourage community contributions and are public about what we are building.
- We ship uncomfortably fast - As a startup, moving fast is key. We are pragmatic about what we build, when we build it, and value iterative development above all.
- Working autonomously - We don't tell you what to do. We decide as a team what will have the biggest impact for our customers and prioritise what we work on from that.
Interview process
- Application. We'll review your application, CV, and any code samples or open source contributions you share.
- Screening call. A chat about your background, the systems you've built, and what you're looking for next.
- Hiring manager call. A deeper technical conversation about your work and how you'd approach the role.
- Paid task day. You'll spend a paid day working on a real engineering problem with us, then walk us through your thinking with the team.
- Final interview. A call with a couple of members of the wider team to answer any remaining questions and align on culture fit.
- References & offer. If everyone is happy, we'll make you an offer to join us.
If this sounds like your dream job, we can't wait to hear from you. If you're not sure that you exactly fit these requirements, get in touch anyway!
Senior Site Reliability Engineer (Europe) employer: Trigger.dev
At Trigger.dev, we pride ourselves on fostering a dynamic and inclusive work culture that empowers our employees to thrive. As a Senior Site Reliability Engineer, you'll enjoy generous benefits including flexible working arrangements, a robust training budget, and a commitment to personal development, all while contributing to an innovative open-source platform that is shaping the future of AI. Our collaborative environment encourages autonomy and creativity, making it an ideal place for those looking to make a meaningful impact in a fast-paced startup atmosphere.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Site Reliability Engineer (Europe)
✨Tip Number 1
Network like a pro! Reach out to folks in your industry on LinkedIn or join relevant Discord channels. We all know that sometimes it’s not just what you know, but who you know that can land you that dream job.
✨Tip Number 2
Get ready for those interviews! Brush up on your technical skills and be prepared to discuss your past projects. We want to see how you think and solve problems, so practice explaining your thought process clearly.
✨Tip Number 3
Show off your passion for open source! If you've contributed to any projects, make sure to highlight that. It shows you’re engaged with the community and understand the importance of collaboration, which is key for us at Trigger.dev.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who take the initiative to connect directly with us.
We think you need these skills to ace Senior Site Reliability Engineer (Europe)
Some tips for your application 🫡
Show Your Passion:When you're writing your application, let your enthusiasm for the role shine through! We want to see that you’re genuinely excited about working with us at Trigger.dev and contributing to our mission.
Tailor Your CV:Make sure your CV is tailored to the Senior Site Reliability Engineer position. Highlight your experience with distributed systems, observability tools, and any relevant projects you've worked on. We love seeing how your skills align with what we do!
Be Clear and Concise:Keep your application clear and to the point. Use straightforward language to describe your experiences and achievements. We appreciate a well-structured application that makes it easy for us to see your qualifications.
Apply Through Our Website:Don’t forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re serious about joining our team!
How to prepare for a job interview at Trigger.dev
✨Know Your Stuff
Make sure you brush up on your knowledge of distributed systems and observability tools like OpenTelemetry and Prometheus. Be ready to discuss your hands-on experience with these technologies, as well as any challenges you've faced and how you overcame them.
✨Showcase Your Problem-Solving Skills
Prepare to share specific examples of how you've hunted down bottlenecks in production systems. Think about times when you optimised performance or improved security in a multi-tenant environment, and be ready to explain your thought process during those situations.
✨Get Familiar with the Company Culture
Since Trigger.dev values open source contributions and community engagement, it’s a good idea to highlight any relevant projects you've worked on. Show that you understand their mission and are excited about being part of a team that prioritises collaboration and continuous learning.
✨Ask Insightful Questions
Prepare some thoughtful questions about the role and the company. Inquire about their approach to scaling infrastructure or how they handle on-call practices. This shows you're genuinely interested and have done your homework, which can set you apart from other candidates.