Job Board

Companies

equals

Site Reliability Engineer I

Site Reliability Engineer I in London

London Full-Time 90000 - 100000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Own and optimise the infrastructure for a fast-growing music platform.
Company: Equals, the world's largest social music network connecting millions of users.
Benefits: Competitive salary, equity package, private health insurance, and office lunches.
Other info: Enjoy a dynamic work environment with regular team socials and growth opportunities.
Why this job: Make a real impact on user experience while working with cutting-edge technology.
Qualifications: Strong AWS experience, PostgreSQL knowledge, and incident response skills required.

The predicted salary is between 90000 - 100000 £ per year.

Equals is the world's largest social music network with over a million users, growing exponentially month to month. We connect people through the music they love - fans discover new people, collect tracks, and connect in artist chatrooms. Our platform serves users across the world with real-time chat, music streaming, and a recommendation engine that matches people by musical taste.

We're looking for a Site Reliability Engineer to own the infrastructure, observability, and operational health of the Equals platform. You'll be the person who monitors systems needs and health to provide a seamless user experience while providing traceability of system needs or failures.

This is a sole-ownership role. You'll be responsible for our entire cloud infrastructure, CI/CD pipelines, monitoring stack, data pipelines, and database performance. You'll work closely with our engineers but your focus is the platform underneath - not feature development.

WHAT YOU'LL OWN

Infrastructure & Cloud
- Manage and evolve our AWS infrastructure via Pulumi (TypeScript): ECS/Fargate services, RDS (PostgreSQL 17), ElastiCache (Redis with read replicas), S3, SQS, ALB, Lambda
- Scale infrastructure up and down for large data operations (e.g. music catalog ingestion of 1B+ rows)
- Manage Cloudflare (WAF, bot management, DNS, firewall rules)
- Make cost-conscious infrastructure decisions - right-sizing instances, storage tiering, optimizing spend
Monitoring & Observability
- Own the Datadog APM setup: tracing, alerting, dashboards, log management
- Maintain and tune alert channels integrated with Slack
- Reduce alert fatigue by tuning thresholds, suppressing false positives, and downgrading non-actionable errors
- Be the first responder when something breaks in production
Reliability & Incident Response
- Investigate and resolve production incidents end-to-end: detection, root cause analysis, fix, and post-mortem
- Handle database performance issues: slow query identification, index creation, query optimization, connection pool tuning
- Manage queue system reliability (BullMQ on Redis): concurrency tuning, rate limiting, stalled job handling, autoscaling
- Ensure graceful handling of failovers, deployments, and edge cases across all services
Data Pipelines & Warehouse
- Manage the Airbyte replication pipeline from production database to data warehouse
- Configure incremental replication for new tables as the product evolves
- Maintain the data warehouse (PostgreSQL): autovacuum tuning, memory parameters, capacity planning
- Manage RudderStack for event streaming to analytics (Amplitude) and attribution (AppsFlyer)
CI/CD & Deployment
- Own CircleCI pipeline configuration and reliability
- Manage ECS deployment strategies, health checks, and rollout verification
- Maintain test environments and cleanup automation

WHAT WE'RE LOOKING FOR

Must Have

Strong experience with AWS (ECS/Fargate, RDS, ElastiCache, S3, ALB, SQS at minimum)
Infrastructure-as-code experience - ideally Pulumi, but Terraform or CDK background is fine
Deep PostgreSQL knowledge: performance tuning, indexing strategies, query optimization, connection pooling
Experience with Redis at scale: clustering, read replicas, failover handling
Solid understanding of container orchestration and deployment strategies
Experience with monitoring and observability platforms (Datadog preferred)
Comfort with incident response: you've been paged at 2am and know how to stay calm, diagnose, and fix
Familiarity with CI/CD pipelines (CircleCI, GitHub Actions, or similar)

Nice to Have

Experience with Pulumi specifically (TypeScript)
Experience with data replication tools (Airbyte, Fivetran, or similar)
Experience with event streaming platforms (RudderStack, Segment, or similar)
Familiarity with BullMQ or similar Redis-based queue systems
Experience with Cloudflare (WAF, bot management)
Familiarity with NestJS / Node.js backend architectures (you won't be building features, but understanding how the backend works helps you support it)
Experience scaling platforms through rapid growth phases

OUR STACK

Infrastructure: AWS (ECS, RDS, ElastiCache, S3, SQS, ALB, Lambda), Pulumi (TypeScript)
Security/CDN: Cloudflare (WAF, bot management, DNS)
Monitoring: Datadog APM
CI/CD: CircleCI
Backend: NestJS, Node.js (v24), TypeScript, Prisma ORM + raw SQL
Database: PostgreSQL 17, Redis (ElastiCache)
Queues: BullMQ
Data Pipelines: Airbyte, RudderStack
Chat: GetStream (Stream.io)
Analytics: Amplitude, AppsFlyer
Payments: Stripe, Apple IAP

WHY THIS ROLE MATTERS

Equals is scaling fast and the infrastructure needs to keep up. You'll be the person the team relies on to keep the platform healthy as we grow. When traffic spikes, you scale it. When something breaks, you fix it. When we need to ingest a billion rows of music catalog data, you make sure the database doesn't run out of storage. You'll have full autonomy over infrastructure decisions and a direct impact on every user's experience.

COMPENSATION

Competitive salary (£90k+) and equity package (£100k+)

PERKS

Laptop and other necessary equipment provided
Office lunch on the house when you visit
Regular socials with the team
Private health insurance

Site Reliability Engineer I in London employer: equals

Equals is an exceptional employer, offering a dynamic work environment where innovation and collaboration thrive. As a Site Reliability Engineer, you'll enjoy competitive compensation, comprehensive health benefits, and the opportunity to make a significant impact on our rapidly growing platform. With a strong focus on employee growth and regular team socials, Equals fosters a supportive culture that values your contributions and encourages professional development.

Contact Detail:

equals Recruiting Team

View equals Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer I in London

✨Tip Number 1

Network like a pro! Reach out to current or former employees at Equals on LinkedIn. A friendly chat can give you insider info and maybe even a referral, which can really boost your chances.

✨Tip Number 2

Show off your skills in real-time! If you get the chance, participate in tech meetups or hackathons related to site reliability engineering. It’s a great way to demonstrate your expertise and meet potential employers.

✨Tip Number 3

Prepare for the interview by brushing up on your AWS and PostgreSQL knowledge. Be ready to discuss specific projects where you’ve managed cloud infrastructure or optimised database performance. We love seeing practical examples!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the Equals team.

We think you need these skills to ace Site Reliability Engineer I in London

AWS (ECS/Fargate, RDS, ElastiCache, S3, ALB, SQS)

Infrastructure-as-code (Pulumi, Terraform, CDK)

PostgreSQL performance tuning

Redis clustering and failover handling

Container orchestration and deployment strategies

Monitoring and observability (Datadog)

Incident response

CI/CD pipelines (CircleCI, GitHub Actions)

Data replication tools (Airbyte, Fivetran)

Event streaming platforms (RudderStack, Segment)

Cloudflare management (WAF, bot management)

Understanding of NestJS / Node.js backend architectures

Database performance optimization

Queue system reliability (BullMQ)

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the Site Reliability Engineer role. Highlight your experience with AWS, PostgreSQL, and any relevant infrastructure-as-code tools like Pulumi or Terraform. We want to see how your skills match what we're looking for!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about the role and how your background makes you a great fit. Don’t forget to mention your experience with monitoring tools like Datadog and your approach to incident response.

Showcase Your Projects: If you've worked on any projects that demonstrate your skills in managing cloud infrastructure or CI/CD pipelines, make sure to include them. We love seeing real-world applications of your expertise, so don’t hold back!

Apply Through Our Website: We encourage you to apply through our website for the best chance of getting noticed. It helps us keep track of applications and ensures you’re considered for the role. Plus, it’s super easy to do!

How to prepare for a job interview at equals

✨Know Your Tech Stack

Make sure you’re well-versed in the technologies mentioned in the job description, especially AWS services like ECS, RDS, and ElastiCache. Brush up on your knowledge of PostgreSQL performance tuning and Redis management, as these will likely come up during technical discussions.

✨Demonstrate Problem-Solving Skills

Prepare to discuss past incidents you've managed, particularly those involving production issues. Be ready to explain your thought process during incident response, including how you diagnosed problems and implemented fixes. This shows you can stay calm under pressure and think critically.

✨Familiarise Yourself with Monitoring Tools

Since monitoring and observability are key parts of the role, get comfortable with Datadog or similar platforms. Understand how to set up alerts, manage dashboards, and reduce alert fatigue. Being able to talk about your experience with these tools will give you an edge.

✨Ask Insightful Questions

Prepare thoughtful questions about Equals' infrastructure and future challenges. Inquire about their current CI/CD practices or how they handle scaling during traffic spikes. This not only shows your interest in the role but also demonstrates your proactive mindset.

Site Reliability Engineer I in London

equals

Location: London

Apply now

Site Reliability Engineer I in London

At a Glance

Site Reliability Engineer I in London employer: equals

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Site Reliability Engineer I in London

Some tips for your application 🫡

How to prepare for a job interview at equals

Site Reliability Engineer I in London

Land your dream job quicker with Premium