Evals Research Scientist / Engineer
Evals Research Scientist / Engineer

Evals Research Scientist / Engineer

City of London Full-Time 100000 - 200000 £ / year (est.) No home office possible
C

At a Glance

  • Tasks: Join us in evaluating AI safety and control for frontier models with leading labs.
  • Company: Apollo Research, a pioneer in AI risk management and safety evaluations.
  • Benefits: Competitive salary, unlimited vacation, flexible hours, and professional development budget.
  • Why this job: Make a real impact on AI safety while collaborating with top-tier tech companies.
  • Qualifications: Empirical research experience, strong communication skills, and software engineering expertise.
  • Other info: Dynamic team culture focused on truth-seeking and constructive feedback.

The predicted salary is between 100000 - 200000 £ per year.

Application deadline: We\’re accepting applications until 31 October 2025. We encourage early submissions and will start interviews in early October.

ABOUT THE OPPORTUNITY

We’re looking for Research Scientists and Research Engineers who are excited to work on safety evaluations, the science of scheming, or control/monitoring for frontier models.

YOU WILL HAVE THE OPPORTUNITY TO

  • Work with frontier labs like OpenAI, Anthropic, and Google DeepMind, by running pre-deployment evaluations and collaborating closely on mitigations, see e.g. our work on anti-scheming or OpenAI’s o1-preview system card and Anthropics’s Opus 4 and Sonnet 4 system card.
  • Build evaluations for scheming-related properties (such as deceptive reasoning, sabotage, and deception tendencies). See our conceptual work on scheming, e.g. evaluation-based safety cases for scheming or how scheming could arise.
  • Work on the science of scheming, e.g. by studying model organisms or real-world examples of scheming in detail. Our goal is to develop a much better theoretical understanding of why models scheme and which components of training and deployment cause it.
  • Work on automating the entire evals pipeline. We aim to automate substantial parts of evals ideation, generation, running and analysis.
  • Design and evaluate AI control protocols. Since agents have longer and longer time-horizons, we\’re shifting more effort to deployment-time monitoring and other control methods.
  • Note: We are not hiring for interpretability roles.

KEY REQUIREMENTS

  • We don’t require a formal background or industry experience and welcome self-taught candidates.
  • Experience in empirical research related to scheming, AI control and evaluations and a scientific mindset: You have designed and executed experiments. You can identify alternative explanations for findings and test alternative hypotheses to avoid overinterpreting results. This experience can come from academia, industry, or independent research.
  • Track record of excellent scientific writing and communication: You can understand and communicate complex technical concepts to our target audience and synthesize scientific results into coherent narratives.
  • Comprehensive experience in Large Language Model (LLM) steering and the supporting Data Science and Data Engineering skills. LLM steering can take many different forms, such as: a) prompting, b) LM agents and scaffolding, c) fluent LLM usage and integration into your own workflows, d) experience with supervised fine-tuning, e) experience with RL on LLMs.
  • Software engineering skills: Our entire stack uses Python. We\’re looking for candidates with strong software engineering experience.
  • (Bonus) We have recently switched to Inspect as our primary evals framework, and we value experience with it.
  • Depending on your preferred role and how these characteristics weigh up, we can offer either a RS or RE role.

We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position, nonetheless, are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine.

LOGISTICS

  • Start Date: Target of 2-3 months after the first interview.
  • Time Allocation: Full-time
  • Location: The office is in London, and the building is shared with the London Initiative for Safe AI (LISA) offices. This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.
  • Work Visas: We can sponsor UK visas

BENEFITS

  • Salary: 100k – 200k GBP (~135k – 270k USD)
  • Flexible work hours and schedule
  • Unlimited vacation
  • Unlimited sick leave
  • Lunch, dinner, and snacks are provided for all employees on workdays
  • Paid work trips, including staff retreats, business trips, and relevant conferences
  • A yearly $1,000 (USD) professional development budget

ABOUT APOLLO RESEARCH

The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks.

At Apollo Research, we’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We’re particularly concerned with deceptive alignment / scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight. We work on the detection of scheming (e.g., building evaluations), the science of scheming (e.g., model organisms), and scheming mitigations (e.g., anti-scheming and control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment or collaborate on scheming mitigations.

At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful.

ABOUT THE TEAM

The current evals team consists of several researchers. You will mostly work with the evals team, but you will likely sometimes interact with the governance team to translate technical knowledge into concrete recommendations.

Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.

How to apply: Please complete the application form with your CV. The provision of a cover letter is optional but not necessary. Please also feel free to share links to relevant work samples.

About the interview process: Our multi-stage process includes a screening interview, a take-home test (approx. 2.5 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job.

#J-18808-Ljbffr

Evals Research Scientist / Engineer employer: COL Limited

At Apollo Research, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration with leading AI labs. Our London office provides a vibrant environment where employees enjoy flexible hours, unlimited vacation, and a generous professional development budget, all while working on cutting-edge research that addresses critical challenges in AI safety. We are committed to supporting our team's growth and well-being, making Apollo Research a truly rewarding place to advance your career.
C

Contact Detail:

COL Limited Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Evals Research Scientist / Engineer

✨Tip Number 1

Network like a pro! Reach out to people in the industry, especially those working at places like OpenAI or Google DeepMind. A friendly chat can open doors and give you insights that your CV just can't.

✨Tip Number 2

Prepare for those technical interviews by brushing up on your Python skills and understanding LLM steering. We want to see you shine, so practice explaining complex concepts in simple terms!

✨Tip Number 3

Don’t underestimate the power of a good portfolio! Showcase your projects related to scheming and AI control. It’s a great way to demonstrate your hands-on experience and creativity.

✨Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, we love seeing candidates who take the initiative to connect directly with us.

We think you need these skills to ace Evals Research Scientist / Engineer

Empirical Research
AI Control
Safety Evaluations
Experimental Design
Scientific Writing
Communication of Technical Concepts
Large Language Model (LLM) Steering
Data Science
Data Engineering
Python Programming
Software Engineering
Prompting Techniques
Supervised Fine-Tuning
Reinforcement Learning on LLMs
Automation of Evaluation Pipelines

Some tips for your application 🫡

Get Your CV Spot On: Make sure your CV is tailored to the role. Highlight your experience in empirical research, scientific writing, and any relevant software engineering skills. We want to see how your background aligns with what we're looking for!

Show Off Your Communication Skills: Since we value excellent scientific writing, consider including a brief summary of a project or paper you've worked on. This will help us see how you communicate complex ideas clearly and effectively.

Be Yourself in the Cover Letter: While a cover letter isn't mandatory, if you choose to include one, let your personality shine through! Share why you're excited about the role and how you can contribute to our mission at Apollo Research.

Apply Early Through Our Website: Don’t wait until the last minute! We encourage early submissions, so get your application in through our website as soon as you can. This gives us more time to review your application and potentially set up an interview!

How to prepare for a job interview at COL Limited

✨Know Your Stuff

Make sure you brush up on your knowledge of scheming, AI control, and evaluations. Familiarise yourself with the latest research and developments in the field, especially from organisations like OpenAI and Google DeepMind. This will not only help you answer questions confidently but also show your genuine interest in the role.

✨Show Off Your Writing Skills

Since excellent scientific writing is a key requirement, prepare to discuss your past work and how you've communicated complex concepts. Bring examples of your writing or presentations that demonstrate your ability to synthesise information clearly. This will give the interviewers a taste of your communication style.

✨Get Hands-On with Python

As the entire stack uses Python, make sure you're comfortable with it. Brush up on your software engineering skills and be ready to discuss any relevant projects you've worked on. If you have experience with Inspect, definitely highlight that as it’s a bonus for this role!

✨Prepare for Technical Challenges

Expect technical interviews to focus on tasks you'd encounter in the job. Practice problem-solving and experiment design related to scheming and AI control. Think about how you would approach automating the evals pipeline and be ready to share your thought process during the interview.

Evals Research Scientist / Engineer
COL Limited
Location: City of London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

C
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>