AI Agent Reliability Engineer - Chaps
AI Agent Reliability Engineer - Chaps

AI Agent Reliability Engineer - Chaps

London Full-Time 48000 - 72000 £ / year (est.) No home office possible
Go Premium
Craft Docs Limited, Inc.

At a Glance

  • Tasks: Join our AI Product team to enhance multi-agent systems for seamless productivity.
  • Company: Craft redefines productivity with innovative AI solutions that empower users.
  • Benefits: Enjoy a collaborative culture, flexible work options, and opportunities for personal growth.
  • Why this job: Be part of a cutting-edge team creating reliable AI assistants that users can trust.
  • Qualifications: Experience with LLM evaluation frameworks and strong Python/TypeScript skills required.
  • Other info: We value creativity, collaboration, and clear communication in our fast-paced environment.

The predicted salary is between 48000 - 72000 £ per year.

About Craft & Chaps

At Craft, we rethink productivity from first principles. Our products disappear into the background so people can do their life's work-fast, joyfully, and without friction. Chaps is our new AI-first product, focused on turning a constellation of large-language-model agents into a seamless personal productivity assistant.

About the role

Our AI Product team is looking for an engineer who obsesses over making multi-agent systems robust, observable, and continuously improving. You'll build the test harnesses, evaluation pipelines, and monitoring layers that keep dozens of collaborating agents on-task, on-budget, and on-time.

In practice, that means:

  • Designing automated evals that exercise complete agent workflows-catching regressions before they reach users.
  • Instrumenting every prompt, tool-call, and model hop with rich telemetry so we can trace root causes in minutes, not days.
  • Creating feedback loops that turn logs, user ratings, and synthetic tests into better prompts and safer behaviours.
  • Future-proofing agentic systems by allowing quality to evolve with LLM intelligence.

You will partner with product, research, and infra to ship an AI assistant users can trust-no surprises, no downtime.

What we're looking for

You must have:

  • Hands-on experience with LLM evaluation frameworks (e.g., OpenAI Evals, LangSmith, LLM-Harness) and a track record of turning eval results into product-ready gating.
  • Observability chops -you've wired up tracing/metrics for distributed systems (OpenTelemetry, Prometheus, Grafana) and know how to set SLOs that actually matter.
  • Prompt-engineering fluency -few-shot, function-calling, RAG orchestration-and an instinct for spotting ambiguity or jailbreak vectors.
  • Production-grade Python/TypeScript skills and comfort shipping through CI/CD (GitHub Actions, Terraform, Docker/K8s).
  • A bias for experimentation: you automate A/B tests, cost-latency trade-off studies, and rollback safeguards as part of the dev cycle.

It would be great if you have:

  • Experience scaling multi-agent planners or tool-using agents in real products.
  • Familiarity with vector databases, semantic diff tooling, or RLHF/RLAIF pipelines.
  • A knack for weaving human feedback (support tickets, thumbs-downs) into automated regression tests.

Our Culture

  • Think differently. We value novel ideas over legacy playbooks-and we give you room to explore.
  • People first. You instrument systems so users never feel the bumps; you collaborate so teammates never feel stuck.
  • Pragmatic craftsmanship. We ship fast, but we measure twice-data accuracy, latency budgets, and reliability all matter.
  • Clear communicators. You translate metrics into stories that product managers and designers understand, sparking better decisions.

Join us if you want to make AI that works-every request, every time.

AI Agent Reliability Engineer - Chaps employer: Craft Docs Limited, Inc.

At Craft, we foster a dynamic work environment that prioritises innovation and collaboration, making it an exceptional place for an AI Agent Reliability Engineer. Our culture encourages experimentation and values your unique ideas, while our commitment to employee growth ensures you have the resources and support to advance your career in the rapidly evolving field of AI. Located in a vibrant tech hub, you'll enjoy the benefits of a supportive community and access to cutting-edge technology, all while contributing to products that enhance productivity and user experience.
Craft Docs Limited, Inc.

Contact Detail:

Craft Docs Limited, Inc. Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land AI Agent Reliability Engineer - Chaps

✨Tip Number 1

Familiarise yourself with the latest LLM evaluation frameworks like OpenAI Evals and LangSmith. Understanding these tools will not only help you in interviews but also demonstrate your hands-on experience, which is crucial for this role.

✨Tip Number 2

Showcase your ability to implement observability in distributed systems. Be prepared to discuss specific examples where you've used tools like Prometheus or Grafana to set meaningful SLOs, as this is a key requirement for the position.

✨Tip Number 3

Highlight any experience you have with prompt engineering and automation of A/B tests. Being able to talk about how you've improved workflows through experimentation will resonate well with the team’s focus on continuous improvement.

✨Tip Number 4

Prepare to discuss your production-grade Python or TypeScript skills, especially in the context of CI/CD processes. Sharing specific projects where you've successfully shipped code using GitHub Actions or Docker will strengthen your application.

We think you need these skills to ace AI Agent Reliability Engineer - Chaps

Hands-on experience with LLM evaluation frameworks
Observability skills for distributed systems
Proficiency in OpenTelemetry, Prometheus, Grafana
Ability to set meaningful SLOs
Fluency in prompt engineering techniques
Production-grade Python and TypeScript skills
Experience with CI/CD tools like GitHub Actions, Terraform, Docker/K8s
Bias for experimentation and automation of A/B tests
Experience scaling multi-agent planners or tool-using agents
Familiarity with vector databases and semantic diff tooling
Understanding of RLHF/RLAIF pipelines
Ability to integrate human feedback into automated regression tests
Strong communication skills for translating metrics into actionable insights

Some tips for your application 🫡

Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the AI Agent Reliability Engineer position. Familiarise yourself with the technologies mentioned in the job description, such as LLM evaluation frameworks and observability tools.

Tailor Your CV: Customise your CV to highlight relevant experience and skills that align with the job description. Emphasise your hands-on experience with LLM evaluation frameworks, Python/TypeScript skills, and any relevant projects that showcase your ability to work with multi-agent systems.

Craft a Compelling Cover Letter: Write a cover letter that not only outlines your qualifications but also demonstrates your passion for AI and productivity. Mention specific examples of how you've contributed to similar projects or how your skills can help Craft & Chaps achieve their goals.

Showcase Your Problem-Solving Skills: In your application, provide examples of how you've tackled challenges in previous roles, particularly those related to automation, testing, and system reliability. Highlight your bias for experimentation and how it has led to successful outcomes in your past work.

How to prepare for a job interview at Craft Docs Limited, Inc.

✨Showcase Your Technical Skills

Be prepared to discuss your hands-on experience with LLM evaluation frameworks and how you've turned eval results into product-ready gating. Highlight specific projects where you implemented observability tools like OpenTelemetry or Grafana.

✨Demonstrate Problem-Solving Abilities

Expect questions that assess your ability to troubleshoot and improve multi-agent systems. Share examples of how you've created feedback loops or automated tests that enhanced system reliability and performance.

✨Communicate Clearly

Practice translating complex technical metrics into understandable stories. This role requires clear communication with product managers and designers, so be ready to explain your thought process and decisions in a straightforward manner.

✨Emphasise Your Collaborative Spirit

Craft values teamwork and collaboration. Be prepared to discuss how you've worked with cross-functional teams in the past, particularly in shipping AI products. Share experiences where you helped teammates overcome challenges.

AI Agent Reliability Engineer - Chaps
Craft Docs Limited, Inc.
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>