Senior SRE Engineer in London

Senior SRE Engineer in London

London Full-Time 60000 - 80000 Β£ / year (est.) No working from home possible
Encord

At a Glance

  • Tasks: Join our team to optimise and maintain cutting-edge AI infrastructure.
  • Company: Encord, a leading AI data platform trusted by top companies.
  • Benefits: Competitive salary, equity, 25 days leave, and learning budget.
  • Other info: Dynamic culture with regular team events and excellent career growth.
  • Why this job: Make a real impact in the AI space while growing your skills.
  • Qualifications: Experience in SRE, DevOps, or platform engineering is essential.

The predicted salary is between 60000 - 80000 Β£ per year.

About us

Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from development through production. Trusted by Woven by Toyota, AXA, UiPath, Zipline, and more. We're an ambitious team of 100+ working at the frontier of AI and have raised $60M in Series C funding from Wellington Management, CRV, Next47 and Y Combinator.

The role

We're looking for a Senior Site Reliability Engineer to join our growing platform engineering team. You'll be embedded in the teams building and operating Encord's core infrastructure, ensuring our platform is performant, reliable, observable, and scalable. You will lead the planning and execution of efforts needed as we grow from our customer base from hundreds to thousands of AI teams worldwide, and the volume of AI training and supervision data managed by our platform from TBs to PBs of data. You'll drive a culture of performant and resilient software through individual contributions and collaboration with multiple squads.

What You'll Do

  • Performance & Capacity β€” Profile and optimise services handling large-scale data pipelines; perform capacity planning for storage and compute-intensive workloads. Work with squads to establish performance benchmarks and expectations.
  • Collaboration β€” Partner closely with backend and ML engineers to improve deployment pipelines (CI/CD), review infrastructure changes, and champion reliability best practices.
  • Reliability & Availability β€” Define and own SLIs/SLOs/SLAs for critical services; build alerting, runbooks, and incident response processes; lead postmortems with a blameless culture.
  • Infrastructure & Cloud β€” Design, deploy, and maintain cloud infrastructure on GCP and AWS; manage Kubernetes clusters, networking, and storage at petabyte scale.
  • Automation & Tooling β€” Work to improve developer productivity and guide and review automation and tooling efforts across the engineering group.
  • Observability β€” Instrument services with distributed tracing, logging, and metrics (Prometheus, Grafana, OpenTelemetry, Datadog or similar); build infrastructure, define best practices and work with each squad to ensure every service is observable before it goes to production.

What We're Looking For

  • Hands-on SRE, DevOps, platform engineering experience or similar in a production environment.
  • Strong fundamentals in designing, building and maintaining resilient distributed and/or high performance systems.
  • Solid understanding of networking, operating systems and database technologies.
  • Experience with observability fundamentals β€” metrics, logs, traces, and alerting.
  • Comfortable with on-call rotations and incident management.

Tech stack

We are technology agnostic at Encord and not looking for experience across all of these β€” as long as you're open to learning, please apply.

  • Backend: Python and Rust
  • Frontend: TypeScript and React
  • Deployment: Kubernetes
  • Infrastructure: GCP

Why Encord

  • Competitive salary, commission, and meaningful equity in a high-growth startup.
  • Strong in-person culture β€” most of the team works from our London office 4+ days/week.
  • 25 days annual leave + UK public holidays.
  • Annual learning & development budget.
  • Travel for customer visits, events, and conferences across the UK and Europe.
  • Company lunches twice a week.
  • Monthly socials & bi-annual team offsites.

Senior SRE Engineer in London employer: Encord

Encord is an exceptional employer for those seeking to make a significant impact in the AI industry. With a strong emphasis on collaboration and innovation, employees enjoy a vibrant work culture in our London office, complete with competitive salaries, generous annual leave, and ample opportunities for professional growth through learning budgets and team events. Join us to be part of a dynamic team that values resilience and performance while working at the forefront of technology.

Encord

Contact Details:

Encord Recruitment Team

We think you need these skills to ace Senior SRE Engineer in London

Site Reliability Engineering
DevOps
Platform Engineering
Performance Optimisation
Capacity Planning
CI/CD
SLIs/SLOs/SLAs Management