Evaluation Scenario Writer - AI Agent Testing Specialist in London
Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist in London

London Part-Time 13 - 16 £ / hour (est.) Home office (partial)
Go Premium
M

At a Glance

  • Tasks: Design evaluation scenarios for AI agents and create structured test cases.
  • Company: Join Mindrift, where innovation meets opportunity in the AI space.
  • Benefits: Earn up to $50/hour, enjoy flexible remote work, and enhance your portfolio.
  • Why this job: Shape the future of AI while working on exciting projects that fit your schedule.
  • Qualifications: Degree in relevant fields and background in QA or data analysis.
  • Other info: Entry-level position with great potential for growth in a dynamic environment.

The predicted salary is between 13 - 16 £ per hour.

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.

What We Do

At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.

About The Role

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behaviour to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically:

  • Create structured test cases that simulate complex human workflows
  • Define gold-standard behaviour and scoring logic to evaluate agent actions.
  • Analyse agent logs, failure modes, and decision paths
  • Work with code repositories and test frameworks to validate your scenarios
  • Iterate on prompts, instructions, and test cases to improve clarity and difficulty
  • Ensure that scenarios are production-ready, easy to run, and reusable

How To Get Started

Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • Bachelor's and/or Master's Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
  • Background in QA, software testing, data analysis, or NLP annotation
  • Good understanding of test design principles (e.g., reproducibility, coverage, edge cases)
  • Strong written communication skills in English
  • Comfortable with structured formats like JSON/YAML for scenario description
  • Can define expected agent behaviours (gold paths) and scoring logic
  • Basic experience with Python and JS
  • Curious and open to working with AI-generated content, agent logs, and prompt-based behaviour

Nice to Have

  • Experience in writing manual or automated test cases
  • Familiarity with LLM capabilities and typical failure modes
  • Understanding of scoring metrics (precision, recall, coverage, reward functions)

Benefits

  • Get paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs
  • Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments
  • Participate in an advanced AI project and gain valuable experience to enhance your portfolio
  • Influence how future AI models understand and communicate in your field of expertise

Seniority level: Entry level

Employment type: Part-time

Job function: Other

Industries: IT Services and IT Consulting

Referrals increase your chances of interviewing at Mindrift by 2x.

Evaluation Scenario Writer - AI Agent Testing Specialist in London employer: Mindrift

At Mindrift, we pride ourselves on fostering a culture of innovation and collaboration, where your expertise directly contributes to shaping the future of AI. With flexible, remote work options and competitive pay rates up to $50/hour, we offer a unique opportunity for professional growth while working on cutting-edge projects that align with your skills. Join us to be part of a team that values your insights and encourages you to influence the development of AI technology in meaningful ways.
M

Contact Detail:

Mindrift Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Evaluation Scenario Writer - AI Agent Testing Specialist in London

✨Tip Number 1

Network like a pro! Reach out to people in the AI and tech space, especially those who work at Mindrift or similar companies. A friendly chat can open doors that a CV just can't.

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your evaluation scenarios or any relevant projects. This gives us a taste of what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for interviews by brushing up on your knowledge of LLMs and test design principles. We love candidates who can discuss their thought process and how they approach problem-solving.

✨Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows us you're serious about joining the team and contributing to exciting AI projects.

We think you need these skills to ace Evaluation Scenario Writer - AI Agent Testing Specialist in London

Analytical Mindset
Attention to Detail
Test Case Design
Gold-Standard Behaviour Definition
Data Analysis
Software Testing
NLP Annotation
Understanding of Test Design Principles
Strong Written Communication Skills
JSON/YAML Proficiency
Python Experience
Curiosity about AI-generated Content
Familiarity with LLM Capabilities
Understanding of Scoring Metrics

Some tips for your application 🫡

Tailor Your Resume: Make sure your resume highlights relevant experience and skills that match the job description. We want to see how your background aligns with the role of Evaluation Scenario Writer, so don’t hold back on showcasing your expertise in AI and testing!

Show Off Your English Skills: Since we need your resume in English, ensure it’s clear and well-written. Highlight your level of English proficiency right at the top. This helps us understand your communication skills, which are super important for this role.

Be Specific About Your Experience: When detailing your past roles, focus on specific projects or tasks that relate to designing test cases or working with AI. We love seeing concrete examples of how you’ve tackled similar challenges in the past!

Apply Through Our Website: Don’t forget to submit your application through our website! It’s the best way for us to keep track of your application and ensures you’re considered for the role. Plus, it’s super easy to do!

How to prepare for a job interview at Mindrift

✨Know Your Stuff

Make sure you brush up on your knowledge of AI, LLMs, and test design principles. Familiarise yourself with the specific requirements of the role, like creating structured test cases and defining gold-standard behaviours. This will show that you're not just interested in the job, but that you understand what it entails.

✨Show Off Your Analytical Skills

During the interview, be prepared to discuss how you approach problem-solving and analysis. Think of examples where you've had to analyse data or logs, and how you’ve iterated on scenarios to improve clarity. This will demonstrate your analytical mindset and attention to detail, which are crucial for this role.

✨Communicate Clearly

Since strong written communication skills are a must, practice articulating your thoughts clearly and concisely. You might even want to prepare a few examples of your previous work or projects that showcase your ability to write structured formats like JSON/YAML. This will help you stand out as a candidate who can effectively communicate complex ideas.

✨Be Curious and Open-Minded

Mindrift values curiosity, especially when it comes to working with AI-generated content. Be ready to discuss your interest in AI and how you stay updated with the latest trends. Showing that you're eager to learn and adapt will resonate well with the interviewers and align with their mission of shaping the future of AI.

Evaluation Scenario Writer - AI Agent Testing Specialist in London
Mindrift
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

M
  • Evaluation Scenario Writer - AI Agent Testing Specialist in London

    London
    Part-Time
    13 - 16 £ / hour (est.)
  • M

    Mindrift

    50-100
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>