Site Reliability Engineer

Job Board

Companies

Wave Talent

Site Reliability Engineer

Full-Time 136000 - 180000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Own observability and design distributed systems for a fast-growing startup.
Company: Dynamic developer infrastructure startup with a focus on innovation.
Benefits: Competitive salary, equity options, and remote work flexibility.
Other info: High standards and growth opportunities in a collaborative environment.
Why this job: Join a small team tackling real-world challenges in a high-impact role.
Qualifications: Experience in observability, distributed systems, and proficiency in TypeScript or Go.

The predicted salary is between 136000 - 180000 £ per year.

We're partnering with a fast-growing developer infrastructure startup on a senior SRE hire at a pivotal moment in their growth. The platform runs AI agents and background workflows in production at massive scale handling hundreds of millions of executions per month on infrastructure they run themselves. The team is ~13 people. No engineering managers. Engineers own large parts of the system and work directly with the founders.

The core challenge right now is scale. Execution volume is growing faster than the team can build, which means the next hires are walking into genuine distributed systems problems — not a greenfield rebuild or a dashboard feature.

What you'll be working on:

Owning observability across the platform OpenTelemetry, metrics, logs, traces, and making them genuinely useful at 3am
Designing and operating distributed systems primitives under real production load — queues, schedulers, checkpoints, backpressure
Architecting and tuning auto-scaling infrastructure that runs untrusted customer code at high throughput
Hardening multi-tenant sandbox isolation, secrets handling, network policy, and supply chain security
Owning Terraform and IaC as a first principle across a cloud-native footprint
Running on-call practice: SLOs, runbooks, blameless postmortems, paging hygiene

What they're looking for:

Strong observability background production experience with OpenTelemetry, Prometheus or equivalent
Distributed systems experience you've designed or operated systems with non-trivial failure modes
Strong with TypeScript and/or Go; the codebase is TypeScript-heavy with Go emerging as a second language.
Self-managed Kubernetes in production, not just managed control planes
Performance and scaling instincts; you've chased real bottlenecks across app, database, and infra layers
Terraform as a first principle, run at meaningful scale
Security mindset — multi-tenant isolation, least privilege, threat modelling
Postgres and Redis under load, AWS strongly preferred

The process:

Screening call
Hiring manager conversation
Technical with roughly a 10% pass rate
Final with the wider team

The bar is high but if you find that motivating rather than off-putting, that's probably a good sign.

Site Reliability Engineer employer: Wave Talent

Join a dynamic and innovative startup that prioritises employee autonomy and ownership, where engineers directly collaborate with founders to tackle real-world challenges in distributed systems. With a strong focus on personal growth and a culture that embraces high standards, you'll have the opportunity to work on cutting-edge technology while enjoying the flexibility of remote work across Europe or from London. The company offers competitive compensation, equity options, and a supportive environment that fosters creativity and problem-solving.

Contact Details:

Wave Talent Recruitment Team

View Wave Talent profile

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Get your networking game on! Reach out to current employees or connections in the industry. A friendly chat can give you insider info about the company culture and maybe even a referral, which can seriously boost your chances.

✨Tip Number 2

Prepare for those technical interviews like a pro! Brush up on your distributed systems knowledge and be ready to discuss real-world scenarios. Practising with mock interviews can help you feel more confident when it’s showtime.

✨Tip Number 3

Showcase your problem-solving skills! During interviews, don’t just talk about what you’ve done; explain how you tackled challenges. Use specific examples that highlight your experience with observability tools and scaling issues.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive about their job search!

We think you need these skills to ace Site Reliability Engineer

Observability

OpenTelemetry

Prometheus

Distributed Systems

TypeScript

Kubernetes

Terraform

Security Mindset

Postgres

Redis

AWS

Performance Tuning

Scaling

Some tips for your application 🫡

Tailor Your CV:Make sure your CV reflects the skills and experiences that match the job description. Highlight your observability background and distributed systems experience, as these are key for us at StudySmarter.

Craft a Compelling Cover Letter:Use your cover letter to tell us why you're passionate about Site Reliability Engineering. Share specific examples of how you've tackled scaling challenges or improved system performance in your previous roles.

Showcase Your Technical Skills:Don’t shy away from detailing your technical expertise, especially with TypeScript, Go, and Terraform. We want to see how you’ve applied these in real-world scenarios, so be specific!

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity with our growing team.

How to prepare for a job interview at Wave Talent

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, especially OpenTelemetry, TypeScript, and Go. Brush up on your experience with distributed systems and be ready to discuss specific challenges you've faced and how you overcame them.

✨Demonstrate Your Problem-Solving Skills

Prepare to talk about real-world scenarios where you've tackled performance bottlenecks or scaling issues. Use examples that highlight your ability to think critically and act decisively under pressure, especially in a production environment.

✨Show Off Your Observability Knowledge

Since observability is key for this role, be ready to explain how you've implemented metrics, logs, and traces in previous projects. Discuss how you’ve made these tools genuinely useful for your team, especially during high-stress situations.

✨Emphasise Your Security Mindset

Security is a big deal in this role, so come prepared to discuss your approach to multi-tenant isolation and threat modelling. Share any experiences where you’ve had to ensure security while maintaining performance and scalability.

Site Reliability Engineer

Wave Talent

Apply Now

Site Reliability Engineer

At a Glance

Site Reliability Engineer employer: Wave Talent

StudySmarter Expert Advice🤫

We think you need these skills to ace Site Reliability Engineer

Some tips for your application 🫡

How to prepare for a job interview at Wave Talent

Company

Product

Help