Evals Research Scientist / Engineer
Evals Research Scientist / Engineer

Evals Research Scientist / Engineer

London Full-Time 36000 - 60000 Β£ / year (est.) No home office possible
Go Premium
C

At a Glance

  • Tasks: Join our evals team to work on AI safety and model evaluations.
  • Company: Apollo Research focuses on ensuring AI systems are safe and aligned with human values.
  • Benefits: Enjoy flexible hours, unlimited vacation, and a yearly professional development budget.
  • Why this job: Be part of a friendly, goal-oriented culture that values truth-seeking and collaboration.
  • Qualifications: No formal experience required; self-taught candidates are welcome!
  • Other info: In-person role in London with potential for partial remote work.

The predicted salary is between 36000 - 60000 Β£ per year.

Application deadline: We\’re accepting applications until 31 October 2025 . We encourage early submissions and will start interviews in early October.ABOUT THE OPPORTUNITY We\’re looking for Research Scientists and Research Engineers who are excited to work on safety evaluations, the science of scheming, or control/monitoring for frontier models.YOU WILL HAVE THE OPPORTUNITY TOWork with frontier labs like OpenAI, Anthropic, and Google DeepMind, by running pre-deployment evaluations and collaborating closely on mitigations, see e.g. our work on anti-scheming or OpenAI\’s o1-preview system card and Anthropics\’s Opus 4 and Sonnet 4 system card .Build evaluations for scheming -related properties (such as deceptive reasoning, sabotage, and deception tendencies). See our conceptual work on scheming, e.g. evaluation-based safety cases for scheming or how scheming could arise .Work on the science of scheming , e.g. by studying model organisms or real-world examples of scheming in detail. Our goal is to develop a much better theoretical understanding of why models scheme and which components of training and deployment cause it.Work on automating the entire evals pipeline . We aim to automate substantial parts of evals ideation, generation, running and analysis.Design and evaluate AI control protocols. Since agents have longer and longer time-horizons, we\’re shifting more effort to deployment-time monitoring and other control methods.Note: We are not hiring for interpretability roles.KEY REQUIREMENTS We don\’t require a formal background or industry experience and welcome self-taught candidates.Experience in empirical research related to scheming, AI control and evaluations and a scientific mindset: You have designed and executed experiments. You can identify alternative explanations for findings and test alternative hypotheses to avoid overinterpreting results. This experience can come from academia, industry, or independent research.Track record of excellent scientific writing and communication: You can understand and communicate complex technical concepts to our target audience and synthesize scientific results into coherent narratives.Comprehensive experience in Large Language Model (LLM) steering and the supporting Data Science and Data Engineering skills. LLM steering can take many different forms, such as: a) prompting, b) LM agents and scaffolding, c) fluent LLM usage and integration into your own workflows, d) experience with supervised fine-tuning, e) experience with RL on LLMs.Software engineering skills: Our entire stack uses Python. We\’re looking for candidates with strong software engineering experience.(Bonus) We have recently switched to Inspect as our primary evals framework, and we value experience with it.Depending on your preferred role and how these characteristics weigh up, we can offer either a RS or RE role.We want to emphasize that people who feel they don\’t fulfill all of these characteristics but think they would be a good fit for the position, nonetheless, are strongly encouraged to apply . We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine.LOGISTICS Start Date: Target of 2-3 months after the first interview.Time Allocation: Full-timeLocation: The office is in London, and the building is shared with the London Initiative for Safe AI (LISA) offices. This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.Work Visas: We can sponsor UK visasBENEFITS Salary: 100k – 200k GBP (~135k – 270k USD)Flexible work hours and scheduleUnlimited vacationUnlimited sick leaveLunch, dinner, and snacks are provided for all employees on workdaysPaid work trips, including staff retreats, business trips, and relevant conferencesA yearly $1,000 (USD) professional development budgetABOUT APOLLO RESEARCH The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks.At Apollo Research, we\’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We\’re particularly concerned with deceptive alignment / scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight. We work on the detection of scheming (e.g., building evaluations), the science of scheming (e.g., model organisms), and scheming mitigations (e.g., anti-scheming and control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment or collaborate on scheming mitigations.At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful.ABOUT THE TEAM The current evals team consists of several researchers. You will mostly work with the evals team, but you will likely sometimes interact with the governance team to translate technical knowledge into concrete recommendations.Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.How to apply: Please complete the application form with your CV. The provision of a cover letter is optional but not necessary. Please also feel free to share links to relevant work samples.About the interview process: Our multi-stage process includes a screening interview, a take-home test (approx. 2.5 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job.

#J-18808-Ljbffr

Evals Research Scientist / Engineer employer: COL Limited

At Apollo Research, we pride ourselves on fostering a collaborative and innovative work culture that prioritises truth-seeking and constructive feedback. Located in the heart of London, our team enjoys flexible working hours, unlimited vacation, and a generous professional development budget, all while contributing to cutting-edge research in AI safety. We are committed to diversity and inclusion, ensuring that every employee has the opportunity to grow and thrive in their career.
C

Contact Detail:

COL Limited Recruiting Team

StudySmarter Expert Advice 🀫

We think this is how you could land Evals Research Scientist / Engineer

✨Tip Number 1

Familiarise yourself with the latest advancements in AI, particularly around LLMs and their applications. Understanding the nuances of steering LLMs and how they can be used in evaluations will give you a significant edge during discussions.

✨Tip Number 2

Engage with the Apollo Research team by following them on social media or participating in relevant forums. This will not only keep you updated on their work but also help you understand their culture and values, which is crucial for fitting in.

✨Tip Number 3

Prepare for the technical interviews by working on hands-on projects related to LLM evaluations. This practical experience will demonstrate your skills and enthusiasm for the role, making you a more attractive candidate.

✨Tip Number 4

Showcase your collaborative spirit by highlighting any past experiences where you've worked in teams or contributed to group projects. Apollo values teamwork, so demonstrating your ability to thrive in a collaborative environment will strengthen your application.

We think you need these skills to ace Evals Research Scientist / Engineer

Large Language Model (LLM) steering
Prompting techniques
LM agents & scaffolding
Fluent LLM usage
Supervised fine-tuning
Reinforcement Learning (RL)
Software engineering fundamentals
API development
Data science and system design
Data engineering
Front-end development
Empirical research experience
Experimental design and execution
Scientific mindset
Ability to propose and test alternative hypotheses
Collaboration and teamwork skills
Results-oriented approach
Adaptability and willingness to learn new skills

Some tips for your application 🫑

Tailor Your CV: Make sure your CV highlights relevant experience and skills that align with the role of Evals Research Scientist/Engineer. Focus on your experience with LLMs, empirical research, and software engineering.

Craft a Strong Cover Letter: Although a cover letter is optional, it’s a great opportunity to express your enthusiasm for the role and the company. Discuss why you’re interested in Apollo Research and how your background makes you a good fit.

Showcase Relevant Work Samples: If you have any projects or work samples related to LLM evaluations or AI safety, include links in your application. This can help demonstrate your practical experience and skills.

Prepare for the Interview Process: Familiarise yourself with the interview structure and prepare for hands-on LLM evals projects. Review the starter guide provided by Apollo Research to get a head start on potential tasks.

How to prepare for a job interview at COL Limited

✨Understand the Role and Team Dynamics

Familiarise yourself with the specific responsibilities of the Evals Research Scientist/Engineer role and the dynamics of the evals team. Knowing who you will be working with and their projects can help you tailor your responses and show genuine interest in the team's work.

✨Showcase Your LLM Experience

Since steering Large Language Models (LLMs) is a core skill for this position, be prepared to discuss your experience with LLMs. Highlight any projects where you've used prompting, fine-tuning, or integrated LLMs into workflows, as this will demonstrate your practical knowledge.

✨Prepare for Technical Interviews

The technical interviews will focus on tasks relevant to the job. Brush up on empirical research methods, software engineering principles, and any specific tools mentioned in the job description, like Inspect. Practising hands-on LLM evals projects can give you a significant edge.

✨Emphasise Collaborative Values

Apollo Research values a collaborative and results-oriented culture. Be ready to discuss examples from your past experiences that showcase your ability to work well in teams, give and receive constructive feedback, and contribute positively to a friendly work environment.

Evals Research Scientist / Engineer
COL Limited
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

C
  • Evals Research Scientist / Engineer

    London
    Full-Time
    36000 - 60000 Β£ / year (est.)
  • C

    COL Limited

    50-100
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>