At a Glance
- Tasks: Join our AI Product team to enhance multi-agent systems for seamless productivity.
- Company: Craft redefines productivity with innovative AI solutions that empower users.
- Benefits: Enjoy a collaborative culture, flexible work options, and opportunities for personal growth.
- Why this job: Be part of a cutting-edge team creating reliable AI assistants that users can trust.
- Qualifications: Experience with LLM evaluation frameworks and strong Python/TypeScript skills required.
- Other info: We value creativity, collaboration, and clear communication in our fast-paced environment.
The predicted salary is between 48000 - 72000 £ per year.
About Craft & Chaps
At Craft, we rethink productivity from first principles. Our products disappear into the background so people can do their life's work-fast, joyfully, and without friction. Chaps is our new AI-first product, focused on turning a constellation of large-language-model agents into a seamless personal productivity assistant.
About the role
Our AI Product team is looking for an engineer who obsesses over making multi-agent systems robust, observable, and continuously improving. You'll build the test harnesses, evaluation pipelines, and monitoring layers that keep dozens of collaborating agents on-task, on-budget, and on-time.
In practice, that means:
- Designing automated evals that exercise complete agent workflows-catching regressions before they reach users.
- Instrumenting every prompt, tool-call, and model hop with rich telemetry so we can trace root causes in minutes, not days.
- Creating feedback loops that turn logs, user ratings, and synthetic tests into better prompts and safer behaviours.
- Future-proofing agentic systems by allowing quality to evolve with LLM intelligence.
You will partner with product, research, and infra to ship an AI assistant users can trust-no surprises, no downtime.
What we're looking for
You must have:
- Hands-on experience with LLM evaluation frameworks (e.g., OpenAI Evals, LangSmith, LLM-Harness) and a track record of turning eval results into product-ready gating.
- Observability chops -you've wired up tracing/metrics for distributed systems (OpenTelemetry, Prometheus, Grafana) and know how to set SLOs that actually matter.
- Prompt-engineering fluency -few-shot, function-calling, RAG orchestration-and an instinct for spotting ambiguity or jailbreak vectors.
- Production-grade Python/TypeScript skills and comfort shipping through CI/CD (GitHub Actions, Terraform, Docker/K8s).
- A bias for experimentation: you automate A/B tests, cost-latency trade-off studies, and rollback safeguards as part of the dev cycle.
It would be great if you have:
- Experience scaling multi-agent planners or tool-using agents in real products.
- Familiarity with vector databases, semantic diff tooling, or RLHF/RLAIF pipelines.
- A knack for weaving human feedback (support tickets, thumbs-downs) into automated regression tests.
Our Culture
- Think differently. We value novel ideas over legacy playbooks-and we give you room to explore.
- People first. You instrument systems so users never feel the bumps; you collaborate so teammates never feel stuck.
- Pragmatic craftsmanship. We ship fast, but we measure twice-data accuracy, latency budgets, and reliability all matter.
- Clear communicators. You translate metrics into stories that product managers and designers understand, sparking better decisions.
Join us if you want to make AI that works-every request, every time.
AI Agent Reliability Engineer - Chaps employer: Craft Docs Limited, Inc.
Contact Detail:
Craft Docs Limited, Inc. Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land AI Agent Reliability Engineer - Chaps
✨Tip Number 1
Familiarise yourself with the latest LLM evaluation frameworks like OpenAI Evals and LangSmith. Understanding these tools will not only help you in interviews but also demonstrate your hands-on experience, which is crucial for this role.
✨Tip Number 2
Showcase your ability to implement observability in distributed systems. Be prepared to discuss specific examples where you've used tools like Prometheus or Grafana to set meaningful SLOs, as this is a key requirement for the position.
✨Tip Number 3
Highlight any experience you have with prompt engineering and automation of A/B tests. Being able to talk about how you've improved workflows through experimentation will resonate well with the team’s focus on continuous improvement.
✨Tip Number 4
Prepare to discuss your production-grade Python or TypeScript skills, especially in the context of CI/CD processes. Sharing specific projects where you've successfully shipped code using GitHub Actions or Docker will strengthen your application.
We think you need these skills to ace AI Agent Reliability Engineer - Chaps
Some tips for your application 🫡
Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the AI Agent Reliability Engineer position. Familiarise yourself with the technologies mentioned in the job description, such as LLM evaluation frameworks and observability tools.
Tailor Your CV: Customise your CV to highlight relevant experience and skills that align with the job description. Emphasise your hands-on experience with LLM evaluation frameworks, Python/TypeScript skills, and any relevant projects that showcase your ability to work with multi-agent systems.
Craft a Compelling Cover Letter: Write a cover letter that not only outlines your qualifications but also demonstrates your passion for AI and productivity. Mention specific examples of how you've contributed to similar projects or how your skills can help Craft & Chaps achieve their goals.
Showcase Your Problem-Solving Skills: In your application, provide examples of how you've tackled challenges in previous roles, particularly those related to automation, testing, and system reliability. Highlight your bias for experimentation and how it has led to successful outcomes in your past work.
How to prepare for a job interview at Craft Docs Limited, Inc.
✨Showcase Your Technical Skills
Be prepared to discuss your hands-on experience with LLM evaluation frameworks and how you've turned eval results into product-ready gating. Highlight specific projects where you implemented observability tools like OpenTelemetry or Grafana.
✨Demonstrate Problem-Solving Abilities
Expect questions that assess your ability to troubleshoot and improve multi-agent systems. Share examples of how you've created feedback loops or automated tests that enhanced system reliability and performance.
✨Communicate Clearly
Practice translating complex technical metrics into understandable stories. This role requires clear communication with product managers and designers, so be ready to explain your thought process and decisions in a straightforward manner.
✨Emphasise Your Collaborative Spirit
Craft values teamwork and collaboration. Be prepared to discuss how you've worked with cross-functional teams in the past, particularly in shipping AI products. Share experiences where you helped teammates overcome challenges.