SRE/LLM Ops Engineer (UK)
SRE/LLM Ops Engineer (UK)

SRE/LLM Ops Engineer (UK)

Full-Time 48000 - 72000 ÂŁ / year (est.) No home office possible
Go Premium
CluePoints

At a Glance

  • Tasks: Ensure reliable LLM-powered services on Azure and Kubernetes while implementing deep observability.
  • Company: Join CluePoints, a fast-growing tech scale-up transforming clinical trials with AI.
  • Benefits: Enjoy private medical insurance, professional development opportunities, and a vibrant hybrid work culture.
  • Why this job: Make a real impact in healthcare by optimising AI-driven clinical trial processes.
  • Qualifications: 5+ years in SRE/DevOps with experience in LLM or ML applications.
  • Other info: Diverse team with over 40 nationalities, fostering collaboration and continuous learning.

The predicted salary is between 48000 - 72000 ÂŁ per year.

Join to apply for the SRE/LLM Ops Engineer (UK) role at CluePoints. At CluePoints, we’re redefining how clinical trials are run. As the premier provider of Risk-Based Quality Management (RBQM) and Data Quality Oversight software, we harness advanced statistics, artificial intelligence, and machine learning to ensure the quality, accuracy, and integrity of clinical trial data, helping life sciences organisations bring safer, more effective treatments to patients faster. We’re proud to be an ambitious, fast‑growing technology scale‑up with a dynamic and diverse international team representing more than 40+ nationalities. Collaboration, flexibility, and continuous learning are part of our DNA. At CluePoints, you’ll find a culture where you can grow, make an impact, and have fun along the way. Guided by our values of Care, Passion, and Smart Disruption, we’re united by a shared mission: to create smarter ways to run efficient clinical trials and deliver AI‑powered insights that improve human outcomes worldwide.

The Role

The SRE, LLMOps (AI Platform) ensures our LLM‑powered services are reliable, observable, and safe in production on Azure and Kubernetes. You’ll blend classic SRE disciplines with LLM‑specific operations: robust evaluation pipelines, prompt/version governance, model/vendor failover, guardrails, and cost/performance monitoring. You know how to build automation with LangChain/LangGraph, operate API‑based LLMs in production, and manage the inherent non‑determinism of models through rigorous testing and observability.

What You’ll Bring

  • Experience: 5+ years in SRE/DevOps/Platform Engineering with 1–2+ years operating LLM or ML‑backed applications in production (API‑based or hosted models).
  • LLMOps: hands‑on with LangChain/LangGraph building end‑to‑end chains/agents and RAG flows; comfort with vector stores (e.g., Azure AI Search, Pinecone), prompt/version control, and dataset tooling.
  • Observability: proficiency instrumenting LLM traces and app telemetry, alert tuning, and root‑cause analysis; familiarity with LangSmith and/or Arize Phoenix (token/cost tracking, latency, failure modes).
  • Cloud & platform: strong Azure and Kubernetes (AKS) background; GitOps (Flux/ArgoCD), Helm/Kustomize; CI/CD (GitHub Actions/GitLab/Jenkins); IaC (Terraform); secrets, networking, and security baselines.
  • Languages & tooling: Python (preferred) and one of TypeScript/Go; REST/GraphQL; OpenAI/Azure OpenAI/Anthropic APIs; experience with Redis caches, message queues, and feature flags.

What You’ll Be Doing

  • Instrument deep observability: implement tracing for LLM chains/agents (inputs, outputs, token usage, latency, model/version), correlate with app metrics/logs, and set actionable alerts; leverage LangSmith/Arize Phoenix (or similar) and OpenTelemetry where appropriate.
  • Safety & guardrails: integrate content safety, PII redaction, jailbreak/prompt‑injection defenses, and policy‑based rails; document exceptions and reviewer workflows. Prefer native platform features (e.g., Azure AI Content Safety) or programmable rails (e.g., NVIDIA NeMo Guardrails).
  • Cost & capacity management: monitor token and request costs, throughput, and rate limits; implement caching, request shaping, and multi‑tier model selection to balance quality, latency, and spend.
  • Build evaluation & testing pipelines: create golden datasets and automated evals (offline + CI/CD + canary) to catch regressions from code, prompt, data, or model changes; use LangSmith/OpenAI Evals (or equivalents) to track quality trends over time.
  • Platform operations on Azure/Kubernetes: ensure secure, compliant, and cost‑efficient operation; maintain IaC, secrets, networking, scaling, and DR/BCP; partner with Security and QA on regulated SaaS controls.
  • Cross‑functional enablement: work with product/dev teams to set acceptance criteria for AI features, add runtime feature flags/kill‑switches, and embed evals/telemetry from day one.

CluePoints is an equal opportunities employer. We celebrate diversity and are committed to creating an inclusive environment for all employees and applicants. We welcome applications from all individuals regardless of age, disability, gender identity or expression, marital or civil partnership status, pregnancy or maternity, race, religion or belief, sex, or sexual orientation. Any personal data you share during your application will be processed in accordance with the UK GDPR and the Data Protection Act 2018 and will be used solely for recruitment purposes. By submitting your application, you consent to the processing of your data for recruitment and employment purposes.

SRE/LLM Ops Engineer (UK) employer: CluePoints

CluePoints is an exceptional employer that fosters a culture of collaboration, flexibility, and continuous learning, making it an ideal place for SRE/LLM Ops Engineers to thrive. With a commitment to employee growth through professional development opportunities and a vibrant social culture, you will be part of a dynamic team dedicated to redefining clinical trials using cutting-edge technology. Located in the UK, CluePoints offers a hub-based hybrid model that balances work-life integration with meaningful contributions to improving human outcomes worldwide.
CluePoints

Contact Detail:

CluePoints Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land SRE/LLM Ops Engineer (UK)

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with CluePoints employees on LinkedIn. A friendly chat can sometimes lead to opportunities that aren’t even advertised!

✨Tip Number 2

Show off your skills! If you’ve got a portfolio or GitHub showcasing your work with LLMs or SRE practices, make sure it’s polished and ready to go. It’s a great way to demonstrate your expertise beyond just a CV.

✨Tip Number 3

Prepare for the interview by diving deep into CluePoints’ mission and values. Think about how your experience aligns with their goals in clinical trials and AI. Tailoring your answers to reflect their culture can really set you apart!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the CluePoints team!

We think you need these skills to ace SRE/LLM Ops Engineer (UK)

Site Reliability Engineering (SRE)
DevOps
Platform Engineering
LLM Operations
LangChain
LangGraph
Azure
Kubernetes
GitOps
CI/CD
Infrastructure as Code (IaC)
Python
TypeScript
REST APIs
OpenTelemetry

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the SRE/LLM Ops Engineer role. Highlight your experience with Azure, Kubernetes, and LLM applications. We want to see how your skills align with what we're looking for!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Share your passion for clinical trials and how your background in SRE/DevOps can contribute to our mission at CluePoints. Let us know why you’re excited about this opportunity!

Showcase Relevant Projects: If you've worked on any projects involving LangChain, LangGraph, or similar technologies, make sure to mention them. We love seeing real-world examples of your work that demonstrate your expertise and creativity.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands. Plus, it shows us you're keen on joining our team at CluePoints!

How to prepare for a job interview at CluePoints

✨Know Your Tech Stack

Make sure you’re well-versed in the technologies mentioned in the job description, especially Azure, Kubernetes, and the specific tools like LangChain and LangGraph. Brush up on your Python skills and be ready to discuss how you've used these technologies in past projects.

✨Demonstrate Problem-Solving Skills

Prepare to showcase your problem-solving abilities, particularly in relation to LLM operations and observability. Think of examples where you’ve tackled challenges in production environments, such as implementing tracing or managing costs effectively.

✨Show Your Collaborative Spirit

CluePoints values collaboration, so be ready to discuss how you’ve worked with cross-functional teams in the past. Share experiences where you’ve partnered with product or development teams to set acceptance criteria or implement features.

✨Ask Insightful Questions

Prepare thoughtful questions that show your interest in the role and the company’s mission. Inquire about their approach to AI safety and guardrails, or how they measure success in their LLM operations. This not only demonstrates your enthusiasm but also helps you gauge if the company is the right fit for you.

SRE/LLM Ops Engineer (UK)
CluePoints
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>