At a Glance
- Tasks: Define and measure AI quality, build datasets, and automate evaluation workflows.
- Company: Join a leading tech firm focused on innovative AI solutions.
- Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
- Other info: Collaborative environment with exciting projects and career advancement.
- Why this job: Make a real impact on AI systems that enhance user experiences.
- Qualifications: Strong Python skills and experience with ML systems required.
The predicted salary is between 60000 - 80000 € per year.
This role sits at the centre of how we measure and improve AI systems in production. You’ll define what good performance means across LLMs, ASR, TTS, and full speech-to-speech pipelines, and build the datasets, metrics, and evaluation systems that make AI quality measurable and comparable in the real world. You’ll work closely with engineering and product teams to ensure model changes lead to real improvements in user experience, not just better offline benchmarks.
What you’ll do:
- Design and run evaluations across LLM, ASR, TTS, and speech-to-speech systems
- Build real-world datasets and test cases from production behaviour and edge cases
- Define metrics and scorecards for model and system quality
- Benchmark internal models against external and frontier systems
- Build Python tools to automate evaluation workflows
- Create internal leaderboards, red-teaming setups, and regression tests
- Work with engineers and product teams to diagnose system failures
- Turn vague product goals into measurable evaluation frameworks
What this role is about:
- Defining and measuring AI quality in production systems
- Turning real user behaviour into structured evaluation signals
- Ensuring model changes improve real-world performance
- Understanding why AI systems fail, not just whether they do
What good looks like:
- You can translate improved quality into measurable metrics
- You think in terms of system impact (before vs after), not just accuracy
- You’re comfortable working across code, data, and production systems
- You care about real-world behaviour, not just benchmarks
Core skills:
- Strong Python (scripting, data analysis, tooling)
- Experience with ML systems, evaluation, or experimentation
- Understanding of LLMs or speech systems (ASR / TTS)
- Ability to design test cases and structured datasets
- Comfortable working with engineers and product teams
Nice to have:
- Experience with LLM evaluation or benchmarking
- Exposure to speech or multimodal systems
- Familiarity with production APIs or ML systems
- Experience with automated testing or CI-style workflows
AI Evaluations Engineer employer: ConnexAI
As an AI Evaluations Engineer, you will thrive in a dynamic and innovative environment that prioritises collaboration and continuous improvement. Our company fosters a culture of growth, offering ample opportunities for professional development while working on cutting-edge AI technologies that have a real-world impact. Located in a vibrant tech hub, we provide a supportive atmosphere where your contributions directly enhance user experiences and drive meaningful advancements in AI quality.
StudySmarter Expert Advice🤫
We think this is how you could land AI Evaluations Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the AI and tech space, especially those who work with LLMs, ASR, and TTS. A friendly chat can open doors that a CV just can't.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repo showcasing your Python projects related to AI evaluations. This gives potential employers a taste of what you can do beyond the written application.
✨Tip Number 3
Prepare for interviews by diving deep into real-world examples of AI systems you've worked on. Be ready to discuss how you’ve turned vague goals into measurable metrics – that’s what they want to hear!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team.
We think you need these skills to ace AI Evaluations Engineer
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter for the AI Evaluations Engineer role. Highlight your experience with Python, ML systems, and any relevant projects that showcase your ability to define and measure AI quality.
Showcase Your Skills:Don’t just list your skills; demonstrate them! Use specific examples from your past work where you’ve designed evaluations or built datasets. This will help us see how you think in terms of system impact and real-world behaviour.
Be Clear and Concise:When writing your application, keep it clear and to the point. We appreciate straightforward communication, so avoid jargon unless it’s necessary. Make it easy for us to understand your qualifications and enthusiasm for the role.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining the StudySmarter team!
How to prepare for a job interview at ConnexAI
✨Know Your AI Systems
Make sure you brush up on your knowledge of LLMs, ASR, and TTS systems. Understand how they work and their real-world applications. This will help you articulate how you can measure and improve their performance during the interview.
✨Prepare Real-World Examples
Think of specific instances where you've designed evaluations or built datasets in previous roles. Be ready to discuss how these experiences relate to the job description, especially in terms of turning vague goals into measurable outcomes.
✨Showcase Your Python Skills
Since strong Python skills are crucial for this role, be prepared to discuss your experience with scripting, data analysis, and tooling. If possible, bring examples of Python tools you've built or used in evaluation workflows.
✨Collaborate and Communicate
This role involves working closely with engineers and product teams, so highlight your teamwork and communication skills. Be ready to discuss how you've successfully collaborated in the past to diagnose system failures or improve user experience.