At a Glance
- Tasks: Own and optimise the infrastructure for a fast-growing music platform.
- Company: Equals, the world's largest social music network with over a million users.
- Benefits: Competitive salary, equity package, private health insurance, and regular team socials.
- Other info: Enjoy autonomy in your role and be part of a dynamic, rapidly scaling team.
- Why this job: Make a real impact on user experience while working with cutting-edge technology.
- Qualifications: Strong AWS experience, PostgreSQL knowledge, and incident response skills required.
The predicted salary is between 90000 - 100000 £ per year.
ABOUT EQUALS
Equals is the world's largest social music network with over a million users, growing exponentially month to month. We connect people through the music they love - fans discover new people, collect tracks, and connect in artist chatrooms. Our platform serves users across the world with real-time chat, music streaming, and a recommendation engine that matches people by musical taste.
THE ROLE
We're looking for a Site Reliability Engineer to own the infrastructure, observability, and operational health of the Equals platform. You'll be the person who monitors systems needs and health to provide a seamless user experience while providing traceability of system needs or failures. This is a sole-ownership role. You'll be responsible for our entire cloud infrastructure, CI/CD pipelines, monitoring stack, data pipelines, and database performance. You'll work closely with our engineers but your focus is the platform underneath - not feature development.
WHAT YOU'LL OWN
- Infrastructure & Cloud
- Manage and evolve our AWS infrastructure via Pulumi (TypeScript): ECS/Fargate services, RDS (PostgreSQL 17), ElastiCache (Redis with read replicas), S3, SQS, ALB, Lambda
- Scale infrastructure up and down for large data operations (e.g. music catalog ingestion of 1B+ rows)
- Manage Cloudflare (WAF, bot management, DNS, firewall rules)
- Make cost-conscious infrastructure decisions - right-sizing instances, storage tiering, optimizing spend
- Monitoring & Observability
- Own the Datadog APM setup: tracing, alerting, dashboards, log management
- Maintain and tune alert channels integrated with Slack
- Reduce alert fatigue by tuning thresholds, suppressing false positives, and downgrading non-actionable errors
- Be the first responder when something breaks in production
- Reliability & Incident Response
- Investigate and resolve production incidents end-to-end: detection, root cause analysis, fix, and post-mortem
- Handle database performance issues: slow query identification, index creation, query optimization, connection pool tuning
- Manage queue system reliability (BullMQ on Redis): concurrency tuning, rate limiting, stalled job handling, autoscaling
- Ensure graceful handling of failovers, deployments, and edge cases across all services
- Data Pipelines & Warehouse
- Manage the Airbyte replication pipeline from production database to data warehouse
- Configure incremental replication for new tables as the product evolves
- Maintain the data warehouse (PostgreSQL): autovacuum tuning, memory parameters, capacity planning
- Manage RudderStack for event streaming to analytics (Amplitude) and attribution (AppsFlyer)
- CI/CD & Deployment
- Own CircleCI pipeline configuration and reliability
- Manage ECS deployment strategies, health checks, and rollout verification
- Maintain test environments and cleanup automation
WHAT WE'RE LOOKING FOR
- Must Have
- Strong experience with AWS (ECS/Fargate, RDS, ElastiCache, S3, ALB, SQS at minimum)
- Infrastructure-as-code experience - ideally Pulumi, but Terraform or CDK background is fine
- Deep PostgreSQL knowledge: performance tuning, indexing strategies, query optimization, connection pooling
- Experience with Redis at scale: clustering, read replicas, failover handling
- Solid understanding of container orchestration and deployment strategies
- Experience with monitoring and observability platforms (Datadog preferred)
- Comfort with incident response: you've been paged at 2am and know how to stay calm, diagnose, and fix
- Familiarity with CI/CD pipelines (CircleCI, GitHub Actions, or similar)
- Nice to Have
- Experience with Pulumi specifically (TypeScript)
- Experience with data replication tools (Airbyte, Fivetran, or similar)
- Experience with event streaming platforms (RudderStack, Segment, or similar)
- Familiarity with BullMQ or similar Redis-based queue systems
- Experience with Cloudflare (WAF, bot management)
- Familiarity with NestJS / Node.js backend architectures (you won't be building features, but understanding how the backend works helps you support it)
- Experience scaling platforms through rapid growth phases
OUR STACK
- Infrastructure: AWS (ECS, RDS, ElastiCache, S3, SQS, ALB, Lambda), Pulumi (TypeScript)
- Security/CDN: Cloudflare (WAF, bot management, DNS)
- Monitoring: Datadog APM
- CI/CD: CircleCI
- Backend: NestJS, Node.js (v24), TypeScript, Prisma ORM + raw SQL
- Database: PostgreSQL 17, Redis (ElastiCache)
- Queues: BullMQ
- Data Pipelines: Airbyte, RudderStack
- Chat: GetStream (Stream.io)
- Analytics: Amplitude, AppsFlyer
- Payments: Stripe, Apple IAP
WHY THIS ROLE MATTERS
Equals is scaling fast and the infrastructure needs to keep up. You'll be the person the team relies on to keep the platform healthy as we grow. When traffic spikes, you scale it. When something breaks, you fix it. When we need to ingest a billion rows of music catalog data, you make sure the database doesn't run out of storage. You'll have full autonomy over infrastructure decisions and a direct impact on every user's experience.
COMPENSATION
Competitive salary (£90k+) and equity package (£100k+)
PERKS
- Laptop and other necessary equipment provided
- Office lunch on the house when you visit
- Regular socials with the team
- Private health insurance
Site Reliability Engineer (Home-based) in London employer: equals
Contact Detail:
equals Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer (Home-based) in London
✨Tip Number 1
Network like a pro! Reach out to people in the industry, attend meetups, and connect with Equals on social media. You never know who might have the inside scoop on job openings or can put in a good word for you.
✨Tip Number 2
Show off your skills! Create a personal project or contribute to open-source that showcases your expertise in AWS, PostgreSQL, or any of the tools mentioned in the job description. This gives you something tangible to discuss during interviews.
✨Tip Number 3
Prepare for technical interviews by brushing up on your incident response strategies and database performance tuning. Practice common scenarios you might face as a Site Reliability Engineer, so you're ready to impress when it counts.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in being part of the Equals team.
We think you need these skills to ace Site Reliability Engineer (Home-based) in London
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the Site Reliability Engineer role. Highlight your experience with AWS, PostgreSQL, and any relevant infrastructure-as-code tools like Pulumi or Terraform. We want to see how your skills match what we're looking for!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Share your passion for site reliability and how you can contribute to Equals' growth. Mention specific projects or experiences that relate to our tech stack and the responsibilities of the role.
Showcase Your Problem-Solving Skills: In your application, don’t just list your skills—show us how you've used them to solve real problems. Whether it’s handling production incidents or optimising database performance, we want to hear about your hands-on experience!
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands. Plus, it shows us you're genuinely interested in joining our team at Equals!
How to prepare for a job interview at equals
✨Know Your Tech Stack
Make sure you’re well-versed in the technologies mentioned in the job description, especially AWS services like ECS, RDS, and ElastiCache. Brush up on your knowledge of PostgreSQL performance tuning and Redis management, as these will likely come up during technical discussions.
✨Demonstrate Problem-Solving Skills
Prepare to discuss past incidents you've managed, particularly those involving production issues. Be ready to walk through your thought process for diagnosing problems and implementing solutions, showcasing your calmness under pressure.
✨Familiarise Yourself with Monitoring Tools
Since monitoring and observability are key parts of the role, get comfortable with Datadog or similar platforms. You might be asked how you would set up alerts or manage log data, so having a few examples in mind can really help.
✨Show Your Ownership Mindset
This role is all about taking ownership of the infrastructure. Be prepared to discuss how you’ve previously taken charge of projects or systems, and how you approach making cost-effective decisions while ensuring reliability and performance.