At a Glance
- Tasks: Own and optimise the infrastructure for a fast-growing music platform.
- Company: Equals, the world's largest social music network connecting millions of users.
- Benefits: Competitive salary, equity package, private health insurance, and office perks.
- Other info: Enjoy autonomy in your role and excellent career growth opportunities.
- Why this job: Make a real impact on user experience while working with cutting-edge technology.
- Qualifications: Strong AWS experience and deep PostgreSQL knowledge required.
The predicted salary is between 90000 - 90000 € per year.
ABOUT EQUALS
Equals is the world's largest social music network with over a million users, growing exponentially month to month. We connect people through the music they love - fans discover new people, collect tracks, and connect in artist chatrooms. Our platform serves users across the world with real-time chat, music streaming, and a recommendation engine that matches people by musical taste.
THE ROLE
We're looking for a Site Reliability Engineer to own the infrastructure, observability, and operational health of the Equals platform. You'll be the person who monitors systems needs and health to provide a seamless user experience while providing traceability of system needs or failures. This is a sole-ownership role. You'll be responsible for our entire cloud infrastructure, CI/CD pipelines, monitoring stack, data pipelines, and database performance. You'll work closely with our engineers but your focus is the platform underneath - not feature development.
WHAT YOU'LL OWN
- Infrastructure & Cloud: Manage and evolve our AWS infrastructure via Pulumi (TypeScript): ECS/Fargate services, RDS (PostgreSQL 17), ElastiCache (Redis with read replicas), S3, SQS, ALB, Lambda; Scale infrastructure up and down for large data operations (e.g. music catalog ingestion of 1B+ rows); Manage Cloudflare (WAF, bot management, DNS, firewall rules); Make cost-conscious infrastructure decisions - right-sizing instances, storage tiering, optimizing spend.
- Monitoring & Observability: Own the Datadog APM setup: tracing, alerting, dashboards, log management; Maintain and tune alert channels integrated with Slack; Reduce alert fatigue by tuning thresholds, suppressing false positives, and downgrading non-actionable errors; Be the first responder when something breaks in production.
- Reliability & Incident Response: Investigate and resolve production incidents end-to-end: detection, root cause analysis, fix, and post-mortem; Handle database performance issues: slow query identification, index creation, query optimization, connection pool tuning; Manage queue system reliability (BullMQ on Redis): concurrency tuning, rate limiting, stalled job handling, autoscaling; Ensure graceful handling of failovers, deployments, and edge cases across all services.
- Data Pipelines & Warehouse: Manage the Airbyte replication pipeline from production database to data warehouse; Configure incremental replication for new tables as the product evolves; Maintain the data warehouse (PostgreSQL): autovacuum tuning, memory parameters, capacity planning; Manage RudderStack for event streaming to analytics (Amplitude) and attribution (AppsFlyer).
- CI/CD & Deployment: Own CircleCI pipeline configuration and reliability; Manage ECS deployment strategies, health checks, and rollout verification; Maintain test environments and cleanup automation.
WHAT WE'RE LOOKING FOR
Must Have: Strong experience with AWS (ECS/Fargate, RDS, ElastiCache, S3, ALB, SQS at minimum); Infrastructure-as-code experience - ideally Pulumi, but Terraform or CDK background is fine; Deep PostgreSQL knowledge: performance tuning, indexing strategies, query optimization, connection pooling; Experience with Redis at scale: clustering, read replicas, failover handling; Solid understanding of container orchestration and deployment strategies; Experience with monitoring and observability platforms (Datadog preferred); Comfort with incident response: you've been paged at 2am and know how to stay calm, diagnose, and fix; Familiarity with CI/CD pipelines (CircleCI, GitHub Actions, or similar).
Nice to Have: Experience with Pulumi specifically (TypeScript); Experience with data replication tools (Airbyte, Fivetran, or similar); Experience with event streaming platforms (RudderStack, Segment, or similar); Familiarity with BullMQ or similar Redis-based queue systems; Experience with Cloudflare (WAF, bot management); Familiarity with NestJS / Node.js backend architectures (you won't be building features, but understanding how the backend works helps you support it); Experience scaling platforms through rapid growth phases.
OUR STACK
Infrastructure: AWS (ECS, RDS, ElastiCache, S3, SQS, ALB, Lambda), Pulumi (TypeScript); Security/CDN: Cloudflare (WAF, bot management, DNS); Monitoring: Datadog APM; CI/CD: CircleCI; Backend: NestJS, Node.js (v24), TypeScript, Prisma ORM + raw SQL; Database: PostgreSQL 17, Redis (ElastiCache); Queues: BullMQ; Data Pipelines: Airbyte, RudderStack; Chat: GetStream (Stream.io); Analytics: Amplitude, AppsFlyer; Payments: Stripe, Apple IAP.
WHY THIS ROLE MATTERS
Equals is scaling fast and the infrastructure needs to keep up. You'll be the person the team relies on to keep the platform healthy as we grow. When traffic spikes, you scale it. When something breaks, you fix it. When we need to ingest a billion rows of music catalog data, you make sure the database doesn't run out of storage. You'll have full autonomy over infrastructure decisions and a direct impact on every user's experience.
COMPENSATION
Competitive salary (£90k+) and equity package (£100k+).
PERKS
Laptop and other necessary equipment provided; Office lunch on the house when you visit; Regular socials with the team; Private health insurance.
Site Reliability Engineer in England employer: LinkedIn
Equals is an exceptional employer, offering a dynamic work environment where innovation meets passion for music. As a Site Reliability Engineer, you'll enjoy competitive compensation, private health insurance, and the opportunity to work autonomously on cutting-edge infrastructure that directly impacts user experience. With a strong focus on employee growth and regular team socials, Equals fosters a collaborative culture that values your contributions while you help shape the future of social music networking.
StudySmarter Expert Advice🤫
We think this is how you could land Site Reliability Engineer in England
✨Tip Number 1
Network like a pro! Reach out to current employees at Equals on LinkedIn or other platforms. Ask them about their experiences and any tips they might have for the interview process. It’s all about making connections!
✨Tip Number 2
Prepare for technical interviews by brushing up on your AWS skills and PostgreSQL knowledge. Practice common SRE scenarios and incident response strategies. We want you to feel confident when discussing your expertise!
✨Tip Number 3
Showcase your problem-solving skills during interviews. Be ready to discuss past incidents you've managed, how you approached them, and what you learned. This will demonstrate your reliability and ability to handle pressure.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the Equals team!
We think you need these skills to ace Site Reliability Engineer in England
Some tips for your application 🫡
Show Your Passion for Music:When you're writing your application, let your love for music shine through! Mention how music connects people and why you want to be part of a platform that celebrates this connection. It’ll show us that you’re not just looking for a job, but you genuinely care about what we do.
Tailor Your Experience:Make sure to highlight your relevant experience with AWS and infrastructure management. Use specific examples from your past roles that align with the responsibilities listed in the job description. We want to see how your skills can directly contribute to keeping our platform running smoothly!
Be Clear and Concise:Keep your application straightforward and to the point. Avoid jargon unless it’s necessary, and make sure your key achievements stand out. We appreciate clarity, and it helps us quickly understand how you can fit into our team.
Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it makes the whole process smoother for everyone involved.
How to prepare for a job interview at LinkedIn
✨Know Your Tech Stack
Make sure you’re well-versed in the technologies mentioned in the job description, especially AWS services like ECS, RDS, and ElastiCache. Brush up on your PostgreSQL performance tuning skills and be ready to discuss your experience with infrastructure-as-code tools like Pulumi or Terraform.
✨Demonstrate Problem-Solving Skills
Prepare to share specific examples of how you've handled production incidents in the past. Think about times when you had to diagnose issues under pressure, and be ready to explain your thought process during those situations.
✨Show Your Monitoring Expertise
Familiarise yourself with monitoring and observability platforms, particularly Datadog. Be prepared to discuss how you’ve set up alerting, dashboards, and log management in previous roles, and how you’ve reduced alert fatigue.
✨Understand the Bigger Picture
While this role focuses on infrastructure, having a grasp of how the backend works can be beneficial. Familiarise yourself with NestJS and Node.js concepts, as it will help you communicate effectively with the engineering team and support their needs.