Site Reliability Engineer

Job Board

Companies

Wave Talent

Site Reliability Engineer

Full-Time 136000 - 180000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Own observability, design distributed systems, and architect auto-scaling infrastructure.
Company: Fast-growing developer infrastructure startup with a dynamic team.
Benefits: Competitive salary, equity, and remote work options.
Other info: Join a small team where engineers own their projects and collaborate directly with founders.
Why this job: Tackle real challenges in scaling and make a significant impact.
Qualifications: Experience in observability, distributed systems, TypeScript/Go, and Kubernetes.

The predicted salary is between 136000 - 180000 £ per year.

We're partnering with a fast-growing developer infrastructure startup on a senior SRE hire at a pivotal moment in their growth. The platform runs AI agents and background workflows in production at massive scale handling hundreds of millions of executions per month on infrastructure they run themselves. The team is ~13 people. No engineering managers. Engineers own large parts of the system and work directly with the founders. The core challenge right now is scale. Execution volume is growing faster than the team can build, which means the next hires are walking into genuine distributed systems problems — not a greenfield rebuild or a dashboard feature.

What you'll be working on:

Owning observability across the platform OpenTelemetry, metrics, logs, traces, and making them genuinely useful at 3am
Designing and operating distributed systems primitives under real production load — queues, schedulers, checkpoints, backpressure
Architecting and tuning auto-scaling infrastructure that runs untrusted customer code at high throughput
Hardening multi-tenant sandbox isolation, secrets handling, network policy, and supply chain security
Owning Terraform and IaC as a first principle across a cloud-native footprint
Running on-call practice: SLOs, runbooks, blameless postmortems, paging hygiene

What they're looking for:

Strong observability background production experience with OpenTelemetry, Prometheus or equivalent
Distributed systems experience you've designed or operated systems with non-trivial failure modes
Strong with TypeScript and/or Go; the codebase is TypeScript-heavy with Go emerging as a second language.
Self-managed Kubernetes in production, not just managed control planes
Performance and scaling instincts; you've chased real bottlenecks across app, database, and infra layers
Terraform as a first principle, run at meaningful scale
Security mindset — multi-tenant isolation, least privilege, threat modelling
Postgres and Redis under load, AWS strongly preferred

The process:

Screening call
Hiring manager conversation
Technical with roughly a 10% pass rate
Final with the wider team

The bar is high but if you find that motivating rather than off-putting, that's probably a good sign.

Site Reliability Engineer employer: Wave Talent

Join a dynamic and innovative startup that prioritises employee ownership and direct collaboration with founders, offering a unique opportunity to tackle real-world distributed systems challenges at scale. With a strong focus on observability and security, this role not only provides competitive compensation and equity but also fosters a culture of growth and learning in a supportive remote environment across Europe or London. Embrace the chance to make a significant impact while working alongside a small, dedicated team passionate about pushing the boundaries of developer infrastructure.

Contact Details:

Wave Talent Recruitment Team

View Wave Talent profile

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Network like a pro! Reach out to current employees on LinkedIn or other platforms. Ask them about their experiences and the company culture. This can give you insider info and might even lead to a referral!

✨Tip Number 2

Prepare for the technical interview by brushing up on your distributed systems knowledge. Dive deep into topics like observability, scaling, and security practices. We recommend doing mock interviews with friends or using online platforms to get comfortable.

✨Tip Number 3

Showcase your projects! If you've worked on relevant projects, make sure to highlight them during interviews. Discuss the challenges you faced and how you overcame them, especially in areas like Terraform and Kubernetes.

✨Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you're genuinely interested in joining our team. Don’t forget to follow up after applying; a little persistence goes a long way!

We think you need these skills to ace Site Reliability Engineer

Observability

OpenTelemetry

Prometheus

Distributed Systems

TypeScript

Kubernetes

Terraform

Security Mindset

Multi-Tenant Isolation

Postgres

Redis

AWS

Performance Tuning

Scaling

Some tips for your application 🫡

Tailor Your CV:Make sure your CV speaks directly to the job description. Highlight your experience with distributed systems, observability tools like OpenTelemetry, and any relevant coding skills in TypeScript or Go. We want to see how your background aligns with our needs!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're excited about the role and how you can tackle the challenges we face at StudySmarter. Be genuine and let your personality come through – we love that!

Showcase Your Projects:If you've worked on any projects that demonstrate your skills in scaling infrastructure or managing observability, make sure to include them. We’re keen to see real-world examples of your work and how you’ve solved complex problems.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to keep track of your application and ensure it gets the attention it deserves. Plus, it shows you’re serious about joining our team!

How to prepare for a job interview at Wave Talent

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, especially OpenTelemetry, TypeScript, and Go. Brush up on your experience with distributed systems and be ready to discuss specific challenges you've faced and how you overcame them.

✨Showcase Your Problem-Solving Skills

Prepare to talk about real-world scenarios where you tackled performance bottlenecks or scaling issues. Use examples that highlight your ability to think critically and act decisively under pressure, especially in production environments.

✨Demonstrate Your Security Mindset

Given the emphasis on security in the role, be prepared to discuss your approach to multi-tenant isolation and threat modelling. Share any relevant experiences where you implemented security measures in a cloud-native environment.

✨Engage with the Team's Culture

Since the team is small and collaborative, show your enthusiasm for working closely with others. Be ready to discuss how you’ve contributed to team dynamics in previous roles and how you can bring that same energy to their startup environment.

Site Reliability Engineer

Wave Talent

Apply Now

Site Reliability Engineer

At a Glance

Site Reliability Engineer employer: Wave Talent

StudySmarter Expert Advice🤫

We think you need these skills to ace Site Reliability Engineer

Some tips for your application 🫡

How to prepare for a job interview at Wave Talent

Company

Product

Help