Research Scientist/Engineer (Evaluations) in London
Research Scientist/Engineer (Evaluations)

Research Scientist/Engineer (Evaluations) in London

London Full-Time 80000 - 120000 £ / year (est.) No home office possible
COL Limited

At a Glance

  • Tasks: Run evaluations on cutting-edge AI systems and automate testing pipelines.
  • Company: Join Apollo Research, a leader in AI risk assessment and evaluation.
  • Benefits: Enjoy competitive salary, unlimited vacation, and professional development budget.
  • Other info: Dynamic team culture with opportunities for growth and collaboration.
  • Why this job: Be at the forefront of AI technology and make a real impact.
  • Qualifications: Strong Python skills and a passion for AI evaluation.

The predicted salary is between 80000 - 120000 £ per year.

Application deadline: We are conducting interviews actively and aim to fill this role as soon as we find someone suitable.

ABOUT THE OPPORTUNITY

We develop and run evaluations that help assess the risks posed by scheming AIs. You will get to work with frontier labs like OpenAI, Anthropic, and Google DeepMind and be amongst the first to interact with new models before anyone else. The ideal candidate loves rigorously testing frontier AI models, and enjoys building efficient pipelines and automating them.

YOU WILL HAVE THE OPPORTUNITY TO:

  • Run pre-deployment evaluation campaigns on the most capable AI systems in the world. We partner with multiple labs, giving you access to a breadth of models that no single AI lab could offer. You'll be among the first people to interact with new models before anyone else.
  • Deep dive into AI cognition. Scan through thousands of model transcripts to surface behavioral patterns that no one has ever observed before. These patterns are often deeply surprising and fascinating to study, e.g. the non-standard language and the reward-seeking reasoning described in our anti-scheming paper.
  • Build new evaluations for frontier risks, from designing novel test environments to scaling them across hundreds of distinct scenarios.
  • Work directly with frontier AI developers. Share your findings, engage with their feedback, and see your evaluations directly inform deployment decisions for the most capable AI systems in the world.
  • Automate and improve the evaluation pipeline. We already use automation across building, running, and analyzing evals. Rapid progress in agent capabilities opens up radically new possibilities, and you'll have the freedom to rethink and reshape the pipeline as they emerge.

KEY REQUIREMENTS

  • Software engineering skills: Our entire stack uses Python. We're looking for candidates with strong software engineering experience. Ideally, you have experience shipping and maintaining production Python code, and know how to factor messy problems into clean abstractions that others can use and extend.
  • Process optimisation: You always try to improve workflows. Pre-deployment evaluations are very fast-paced so ideally you love shaving friction off your workflows wherever possible.
  • Data Analysis & Pattern Recognition: You can extract signal from large, messy datasets. You're comfortable with quantitative analysis and know when qualitative assessment is more appropriate. You can identify anomalies and unexpected model behaviors.
  • Writing and communication: You succinctly convey qualitative and quantitative findings to a technical and non-technical audience.
  • AI power-user: You are curious about the capabilities and propensities of frontier AI models. You have experience using different models, know which ones to use for which tasks, when not to use AI, and you always experiment with new AI workflows.
  • (Bonus) We are using Inspect as our primary evals framework, and we value experience with it.

We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position, nonetheless, are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine. We don’t require a formal background or industry experience and welcome self-taught candidates.

BENEFITS

This role offers market competitive salary, equity, and competitive benefits.

  • Salary: 100k - 200k GBP (~135k - 270k USD)
  • Flexible work hours and schedule
  • Unlimited vacation
  • Unlimited sick leave
  • Lunch, dinner, and snacks are provided for all employees on workdays
  • Paid work trips, including staff retreats, business trips, and relevant conferences
  • A yearly $1,000 (USD) professional development budget

LOGISTICS

  • Time Allocation: Full-time
  • Location: The office is in London, and the building is shared with the London Initiative for Safe AI (LISA) offices. This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.
  • Work Visas: We can sponsor UK visas

ABOUT APOLLO RESEARCH

The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks. At Apollo Research, we’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We’re particularly concerned with deceptive alignment/scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight.

We work on the detection of scheming (e.g. building evaluations and novel evaluation techniques), the science of scheming (e.g. model organisms and the study of scaling trends), and scheming mitigations (e.g. control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment and collaborate on fundamental research.

At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.

ABOUT THE TEAM

The current evals team consists of Jérémy Scheurer, Alex Meinke, Bronson Schoen, Felix Höfstäter, Axel Højmark, Teun van der Weij, Alex Lloyd and Mia Hopman. Alex Meinke coordinates the research agenda with guidance from Marius Hobbhahn, though team members lead individual projects. You will mostly work with the evals team as well as our team of software engineers, but you will likely sometimes interact with the governance team to translate technical knowledge into concrete recommendations. You can find our full team here.

Equality Statement

Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.

How to apply

Please complete the application form with your CV. The provision of a cover letter is optional but not necessary. Please also feel free to share links to relevant work samples.

About the interview process

Our multi-stage process includes a screening interview, a take-home test (approx. 2.5 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no LeetCode-style general coding interviews. If you want to prepare for the interviews, we suggest working on hands-on LLM evals projects (e.g. as suggested in our starter guide), such as building LM agent evaluations in Inspect.

Your Privacy and Fairness in Our Recruitment Process

We are committed to protecting your data, ensuring fairness, and adhering to workplace fairness principles in our recruitment process. To enhance hiring efficiency, we use AI-powered tools to assist with tasks such as resume screening. These tools are designed and deployed in compliance with internationally recognized AI governance frameworks. Your personal data is handled securely and transparently. We adopt a human-centred approach: all resumes are screened by a human and final hiring decisions are made by our team. If you have questions about how your data is processed or wish to report concerns about fairness, please contact us.

Research Scientist/Engineer (Evaluations) in London employer: COL Limited

At Apollo Research, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration in the rapidly evolving field of AI. Our London office provides a unique opportunity to engage with leading AI labs while enjoying competitive salaries, unlimited vacation, and a supportive environment that prioritises professional development and employee well-being. Join us to be at the forefront of AI evaluation and contribute to meaningful advancements in technology.
COL Limited

Contact Detail:

COL Limited Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Research Scientist/Engineer (Evaluations) in London

✨Tip Number 1

Get to know the company and its mission. Research Apollo Research and their work on AI risks. This will help you tailor your conversations during interviews and show that you're genuinely interested in what they do.

✨Tip Number 2

Practice your technical skills! Since this role involves software engineering and data analysis, brush up on your Python coding and be ready to discuss your past projects. Hands-on experience with LLM evals will definitely give you an edge.

✨Tip Number 3

Prepare for the interview process by simulating it. Work on take-home tests or mock interviews with friends. This will help you get comfortable with the format and improve your confidence when discussing your findings and methodologies.

✨Tip Number 4

Don’t hesitate to apply through our website! Even if you don’t meet every single requirement, we encourage you to throw your hat in the ring. You might just surprise us with your unique background and skills!

We think you need these skills to ace Research Scientist/Engineer (Evaluations) in London

Python Programming
Software Engineering
Process Optimisation
Data Analysis
Pattern Recognition
Quantitative Analysis
Qualitative Assessment
Writing Skills
Communication Skills
AI Model Evaluation
Automation
Workflow Improvement
Curiosity about AI Capabilities
Experience with Inspect Framework

Some tips for your application 🫡

Show Off Your Skills: Make sure to highlight your software engineering skills, especially in Python. We want to see how you've tackled messy problems and turned them into clean solutions that others can build on.

Be Clear and Concise: When you write about your experiences, keep it straightforward. We appreciate candidates who can convey complex ideas simply, so think about how you can explain your findings to both technical and non-technical folks.

Tailor Your Application: Don’t just send a generic CV! Tailor your application to reflect the specific requirements of the Research Scientist/Engineer role. Show us why you're the perfect fit for our team and what unique perspectives you bring.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity with us at StudySmarter!

How to prepare for a job interview at COL Limited

✨Know Your AI Models

Familiarise yourself with the latest AI models and their capabilities. Since you'll be working with frontier labs like OpenAI and Google DeepMind, understanding their models will help you engage in meaningful discussions during the interview.

✨Showcase Your Python Skills

Since the role requires strong software engineering skills in Python, be prepared to discuss your past projects. Bring examples of production code you've shipped and explain how you tackled messy problems with clean abstractions.

✨Demonstrate Process Optimisation

Highlight your experience in improving workflows. Share specific examples of how you've streamlined processes in previous roles, especially in fast-paced environments, as this is crucial for pre-deployment evaluations.

✨Communicate Clearly

Practice conveying complex findings succinctly to both technical and non-technical audiences. This skill is vital for sharing your insights on model behaviours and evaluation results, so consider preparing a few key points to illustrate your communication style.

Research Scientist/Engineer (Evaluations) in London
COL Limited
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>