SRE/LLM Ops Engineer (UK) in London
SRE/LLM Ops Engineer (UK)

SRE/LLM Ops Engineer (UK) in London

London Full-Time 48000 - 72000 £ / year (est.) No home office possible
Go Premium
CluePoints

At a Glance

  • Tasks: Ensure reliable LLM services on Azure and Kubernetes while building automation and monitoring performance.
  • Company: Join CluePoints, a fast-growing tech scale-up transforming clinical trials with AI and machine learning.
  • Benefits: Enjoy private medical insurance, professional development opportunities, and a vibrant hybrid work culture.
  • Why this job: Make a real impact in healthcare by optimising AI-powered insights for better patient outcomes.
  • Qualifications: 5+ years in SRE/DevOps with experience in LLM or ML applications; strong Azure and Kubernetes skills.
  • Other info: Be part of a diverse team united by a mission to innovate clinical trial processes.

The predicted salary is between 48000 - 72000 £ per year.

At CluePoints, we’re redefining how clinical trials are run. As the premier provider of Risk-Based Quality Management (RBQM) and Data Quality Oversight software, we harness advanced statistics, artificial intelligence, and machine learning to ensure the quality, accuracy, and integrity of clinical trial data, helping life sciences organisations bring safer, more effective treatments to patients faster. We’re proud to be an ambitious, fast-growing technology scale-up with a dynamic and diverse international team representing more than 40 nationalities. Collaboration, flexibility, and continuous learning are part of our DNA. Guided by our values of Care, Passion, and Smart Disruption, we’re united by a shared mission: to create smarter ways to run efficient clinical trials and deliver AI-powered insights that improve human outcomes worldwide.

The SRE, LLMOps (AI Platform) ensures our LLM-powered services are reliable, observable, and safe in production on Azure and Kubernetes. You’ll blend classic SRE disciplines with LLM-specific operations: robust evaluation pipelines, prompt/version governance, model/vendor failover, guardrails, and cost/performance monitoring. You know how to build automation with LangChain/LangGraph, operate API-based LLMs in production, and manage the inherent non-determinism of models through rigorous testing and observability.

What You’ll Bring

  • Experience: 5+ years in SRE/DevOps/Platform Engineering with 1–2+ years operating LLM or ML-backed applications in production (API-based or hosted models).
  • LLMOps: hands-on with LangChain/LangGraph building end-to-end chains/agents and RAG flows; comfort with vector stores (e.g., Azure AI Search, Pinecone), prompt/version control, and dataset tooling.
  • Observability: proficiency instrumenting LLM traces and app telemetry, alert tuning, and root-cause analysis; familiarity with LangSmith and/or Arize Phoenix (token/cost tracking, latency, failure modes).
  • Cloud & platform: strong Azure and Kubernetes (AKS) background; GitOps (Flux/ArgoCD), Helm/Kustomize; CI/CD (GitHub Actions/GitLab/Jenkins); IaC (Terraform); secrets, networking, and security baselines.
  • Languages & tooling: Python (preferred) and one of TypeScript/Go; REST/GraphQL; OpenAI/Azure OpenAI/Anthropic APIs; experience with Redis caches, message queues, and feature flags.

Job Responsibilities

  • Instrument deep observability: implement tracing for LLM chains/agents (inputs, outputs, token usage, latency, model/version), correlate with app metrics/logs, and set actionable alerts; leverage LangSmith/Arize Phoenix (or similar) and OpenTelemetry where appropriate.
  • Safety & guardrails: integrate content safety, PII redaction, jailbreak/prompt-injection defenses, and policy-based rails; document exceptions and reviewer workflows. Prefer native platform features (e.g., Azure AI Content Safety) or programmable rails (e.g., NVIDIA NeMo Guardrails).
  • Cost & capacity management: monitor token and request costs, throughput, and rate limits; implement caching, request shaping, and multi-tier model selection to balance quality, latency, and spend.
  • Build evaluation & testing pipelines: create golden datasets and automated evals (offline + CI/CD + canary) to catch regressions from code, prompt, data, or model changes; use LangSmith/OpenAI Evals (or equivalents) to track quality trends over time.
  • Platform operations on Azure/Kubernetes: ensure secure, compliant, and cost-efficient operation; maintain IaC, secrets, networking, scaling, and DR/BCP; partner with Security and QA on regulated SaaS controls.
  • Cross-functional enablement: work with product/dev teams to set acceptance criteria for AI features, add runtime feature flags/kill-switches, and embed evals/telemetry from day one.

Job Benefits

  • Private Medical Insurance through Vitality Health (full hospital cover, 24/7 GP, and therapy sessions)
  • Group Critical Illness Cover with Aviva Life Insurance (death-in-service lump sum)
  • Pension Scheme with 9% employer contribution via Scottish Widows
  • Opportunities for professional development and sponsored certifications
  • A hub-based hybrid model that blends flexibility with purpose — connecting teams through collaboration, learning, and a vibrant social culture.

Equal Opportunities & Data Protection Statement

CluePoints is an equal opportunities employer. We celebrate diversity and are committed to creating an inclusive environment for all employees and applicants. We welcome applications from all individuals regardless of age, disability, gender identity or expression, marital or civil partnership status, pregnancy or maternity, race, religion or belief, sex, or sexual orientation. Any personal data you share during your application will be processed in accordance with the UK GDPR and the Data Protection Act 2018 and will be used solely for recruitment purposes. By submitting your application, you consent to the processing of your data for recruitment and employment purposes.

SRE/LLM Ops Engineer (UK) in London employer: CluePoints

At CluePoints, we pride ourselves on being an exceptional employer, offering a vibrant work culture that fosters collaboration and continuous learning. Our commitment to employee growth is reflected in our professional development opportunities and sponsored certifications, all within a flexible hub-based hybrid model that encourages teamwork and innovation. Join us in our mission to revolutionise clinical trials while enjoying comprehensive benefits like private medical insurance and a generous pension scheme.
CluePoints

Contact Detail:

CluePoints Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land SRE/LLM Ops Engineer (UK) in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry on LinkedIn or at meetups. A friendly chat can sometimes lead to job opportunities that aren't even advertised.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing your projects, especially those related to SRE and LLM Ops. This gives potential employers a taste of what you can do.

✨Tip Number 3

Prepare for interviews by practising common questions and scenarios specific to SRE and LLM Ops. We recommend doing mock interviews with friends or using online platforms to get comfortable.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining our team.

We think you need these skills to ace SRE/LLM Ops Engineer (UK) in London

SRE (Site Reliability Engineering)
DevOps
Platform Engineering
LLM Operations
LangChain
LangGraph
Azure AI Search
Kubernetes (AKS)
GitOps (Flux/ArgoCD)
CI/CD (GitHub Actions/GitLab/Jenkins)
Infrastructure as Code (Terraform)
Python
TypeScript or Go
REST/GraphQL APIs
OpenTelemetry

Some tips for your application 🫡

Tailor Your CV: Make sure your CV reflects the skills and experiences that match the SRE/LLM Ops Engineer role. Highlight your experience with Azure, Kubernetes, and LLM operations to show us you’re the right fit!

Craft a Compelling Cover Letter: Use your cover letter to tell us why you're passionate about the role and how your background aligns with our mission at CluePoints. Be genuine and let your personality shine through!

Showcase Your Projects: If you've worked on relevant projects, don’t hesitate to include them! Whether it’s automation with LangChain or observability tools, we want to see what you’ve done and how it relates to the job.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role without any hiccups!

How to prepare for a job interview at CluePoints

✨Know Your Tech Stack

Make sure you’re well-versed in the technologies mentioned in the job description, especially Azure, Kubernetes, and the specific tools like LangChain and LangGraph. Brush up on your Python skills and be ready to discuss how you've used these technologies in past projects.

✨Demonstrate Problem-Solving Skills

Prepare to showcase your problem-solving abilities, particularly in relation to LLM operations. Think of examples where you've tackled issues like model failover or cost/performance monitoring, and be ready to explain your thought process and the outcomes.

✨Showcase Your Collaboration Experience

Since CluePoints values collaboration, come prepared with examples of how you've worked cross-functionally in previous roles. Highlight any experiences where you partnered with product or development teams to set acceptance criteria or implement features.

✨Ask Insightful Questions

At the end of the interview, don’t forget to ask questions that show your interest in the role and the company. Inquire about their approach to AI safety and guardrails, or how they measure success in their LLM operations. This not only shows your enthusiasm but also helps you gauge if the company is the right fit for you.

SRE/LLM Ops Engineer (UK) in London
CluePoints
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>