Evals Research Scientist / Engineer
Evals Research Scientist / Engineer

Evals Research Scientist / Engineer

London Full-Time 36000 - 60000 £ / year (est.) No home office possible
Go Premium
A

At a Glance

  • Tasks: Join us to develop safety evaluations for cutting-edge AI models and publish impactful results.
  • Company: Apollo Research focuses on auditing AI models to ensure they align with human oversight.
  • Benefits: Enjoy flexible hours, unlimited vacation, free meals, and a yearly professional development budget.
  • Why this job: Be part of a mission-driven team tackling AI risks while fostering a supportive and innovative culture.
  • Qualifications: No formal experience needed; just a scientific mindset and strong communication skills.
  • Other info: We sponsor UK work visas and offer relocation support up to £10,000.

The predicted salary is between 36000 - 60000 £ per year.

Evals Research Scientist / Engineer at Apollo ResearchApplication Deadline: We\’re accepting applications until 31 October 2025 . Applications are considered on a rolling basis and may take multiple weeks for a response.

About The OpportunityWe\’re looking for Research Scientists and Research Engineers who are excited to work on safety evaluations, the science of scheming, or control/monitoring for frontier models.

Responsibilities

Work with frontier labs like OpenAI, Anthropic, and Google DeepMind, by running pre-deployment evaluations and collaborating closely on mitigations, see e.g. our work on anti-scheming or OpenAI\’s o1-preview system card and Anthropics\’s Opus 4 and Sonnet 4 system card.

Build evaluations for scheming-related properties (such as deceptive reasoning, sabotage, and deception tendencies). See our conceptual work on scheming, e.g. evaluation-based safety cases for scheming or how scheming could arise.

Work on the \”science of scheming,\” e.g. by studying model organisms or real-world examples of scheming in detail. Our goal is to develop a much better theoretical understanding of why models scheme and which components of training and deployment cause it.

Work on automating the entire evals pipeline. We aim to automate substantial parts of evals ideation, generation, running and analysis.

Design and evaluate AI control protocols. Since agents have longer and longer time-horizons, we\’re shifting more effort to deployment-time monitoring and other control methods.

Note: We are not hiring for interpretability roles.

Key Requirements

We don\’t require a formal background or industry experience and welcome self-taught candidates.

Experience in empirical research related to scheming, AI control and evaluations and a scientific mindset: You have designed and executed experiments. You can identify alternative explanations for findings and test alternative hypotheses to avoid overinterpreting results. This experience can come from academia, industry, or independent research.

Track record of excellent scientific writing and communication: You can understand and communicate complex technical concepts to our target audience and synthesize scientific results into coherent narratives.

Comprehensive experience in Large Language Model (LLM) steering and the supporting Data Science and Data Engineering skills. LLM steering can take many different forms, such as: a) prompting, b) LM agents and scaffolding, c) fluent LLM usage and integration into your own workflows, d) experience with supervised fine-tuning, e) experience with RL on LLMs.

Software engineering skills: Our entire stack uses Python. We\’re looking for candidates with strong software engineering experience.

(Bonus) We have recently switched to Inspect as our primary evals framework, and we value experience with it.

Depending on your preferred role and how these characteristics weigh up, we can offer either a RS or RE role.

Logistics

Start date: Target 2–3 months after first interview.

Time allocation: Full-time.

Location: London office, in-person (partial remote considered case-by-case).

Work visa sponsorship available for UK.

Benefits

Salary: 100k – 200k GBP (~135k – 270k USD).

Flexible work hours and schedule.

Unlimited vacation and sick leave.

Lunch, dinner and snacks provided on workdays.

Paid work trips and conferences.

Annual professional development budget: $1,000 USD.

About Apollo ResearchApollo Research focuses on risks from Loss of Control, especially deceptive alignment/scheming. We develop detection, science, and mitigation strategies for scheming and work closely with frontier AI companies.

About the TeamCurrent evals team includes Mikita Balesni, Jérémy Scheurer, Alex Meinke, Rusheb Shah, Bronson Schoen, Andrei Matveiakin, Felix Höfstätter, Axel Højmark, Nix Goldowsky‐Dill, Teun van der Weij, Alex Lloyd. Marius Hobbhahn manages the team.

Equality StatementApollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.

How to ApplyComplete the application form with your CV. Cover letter optional. Share relevant work samples.

Interview ProcessMulti-stage: screening interview, take-home test (~2.5 hrs), 3 technical interviews, final interview with CEO Marius. Technical interviews aligned with job tasks; no general coding tests.

Privacy StatementWe protect your data, use AI-powered tools for screening, all decisions made by humans. Contact [email protected] with privacy concerns.

#J-18808-Ljbffr

Evals Research Scientist / Engineer employer: Apollo Research

At Apollo Research, we pride ourselves on fostering a dynamic and inclusive work culture that prioritises truth-seeking and constructive feedback. Located in the heart of London, our team enjoys flexible working hours, unlimited vacation, and a generous professional development budget, all while collaborating with leading experts in AI safety evaluations. We are committed to supporting employee growth and well-being, making us an exceptional employer for those passionate about advancing AI responsibly.
A

Contact Detail:

Apollo Research Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Evals Research Scientist / Engineer

Tip Number 1

Familiarise yourself with the latest developments in AI safety evaluations, particularly around deceptive alignment. This knowledge will not only help you understand the role better but also allow you to engage in meaningful discussions during interviews.

Tip Number 2

Get hands-on experience with LLM steering and evaluation frameworks like Inspect. Building your own projects or contributing to open-source initiatives can showcase your skills and passion for the field.

Tip Number 3

Network with professionals in the AI research community. Attend relevant conferences or workshops where you can meet potential colleagues and learn more about the challenges they face in AI evaluations.

Tip Number 4

Prepare for the technical interviews by working on practical LLM evaluation tasks. Focus on creating model organisms and demonstrations that align with the responsibilities outlined in the job description to impress the interviewers.

We think you need these skills to ace Evals Research Scientist / Engineer

Empirical Research Methodologies
Experimental Design
Scientific Writing
Communication of Complex Concepts
Large Language Model (LLM) Steering
Prompt Engineering
LM Agents and Scaffolding
Fluent LLM Usage
Supervised Fine-Tuning
Reinforcement Learning (Human Feedback/AI Feedback)
Software Engineering Skills
API Development
System Design
Front-End Development
Python Programming
Experience with AI Control Protocols
Cyber Security Knowledge
Familiarity with Inspect Framework

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience in empirical research methodologies, scientific writing, and any specific skills related to Large Language Models (LLMs). Customise it to reflect the key requirements mentioned in the job description.

Craft a Strong Cover Letter: Although a cover letter is optional, it's a great opportunity to express your enthusiasm for the role. Discuss how your background aligns with the responsibilities of the Evals Research Scientist/Engineer position and mention any relevant projects or experiences.

Showcase Relevant Work Samples: If you have any work samples that demonstrate your experience with LLMs, evaluations, or related projects, include links in your application. This can help illustrate your capabilities and give the hiring team insight into your practical skills.

Prepare for Technical Interviews: Since the interview process includes technical interviews closely related to the job tasks, consider working on hands-on LLM evals projects beforehand. Familiarise yourself with the Inspect framework and be ready to discuss your approach and findings during the interviews.

How to prepare for a job interview at Apollo Research

Showcase Your Research Experience

Be prepared to discuss your empirical research methodologies and any experiments you've designed. Highlight how you've identified alternative explanations for findings, as this demonstrates a scientific mindset that the company values.

Communicate Complex Concepts Clearly

Since excellent scientific writing and communication are key requirements, practice explaining complex technical concepts in simple terms. This will help you convey your ideas effectively during the interview.

Familiarise Yourself with LLM Steering

Brush up on your knowledge of Large Language Model steering techniques, such as prompting and reinforcement learning. Be ready to discuss your experience with these methods and how they relate to the role you're applying for.

Prepare for Technical Interviews

The technical interviews will focus on tasks relevant to the job. Work on hands-on LLM evals projects, especially using Inspect, to demonstrate your practical skills and understanding of the evaluation process.

Evals Research Scientist / Engineer
Apollo Research
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

A
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>