Research Scientist - Science of Evaluations
Research Scientist - Science of Evaluations

Research Scientist - Science of Evaluations

London Full-Time 52000 - 78000 £ / year (est.) No home office possible
A

At a Glance

  • Tasks: Conduct cutting-edge research on AI capabilities and safety, collaborating with top experts.
  • Company: Join the AI Security Institute, a leading government team focused on AI risks and safety.
  • Benefits: Enjoy flexible working, generous leave, and professional development opportunities.
  • Why this job: Make a real impact in AI safety while working in an innovative and supportive environment.
  • Qualifications: PhD or strong experience in machine learning, with a passion for empirical research.
  • Other info: This role offers competitive salaries and a chance to shape the future of AI evaluations.

The predicted salary is between 52000 - 78000 £ per year.

The AI Security Institute is the largest team in a government dedicated to understanding AI capabilities and risks in the world. Our mission is to equip governments with an empirical understanding of the safety of advanced AI systems. We conduct research to understand the capabilities and impacts of advanced AI and develop and test risk mitigations. We focus on risks with security implications, including the potential of AI to assist with the development of chemical and biological weapons, how it can be used to carry out cyber-attacks, enable crimes such as fraud, and the possibility of loss of control.

The risks from AI are not sci-fi, they are urgent. By combining the agility of a tech start-up with the expertise and mission-driven focus of government, we’re building a unique and innovative organisation to prevent AI’s harms from impeding its potential.

AISI’s Science of Evaluations team will conduct applied and foundational research focused on two areas at the core of our mission: (i) measuring existing frontier AI system capabilities and (ii) predicting the capabilities of a system before running an evaluation.

Measurement of Capabilities: The goal is to develop and apply rigorous scientific techniques for the measurement of frontier AI system capabilities, so they are accurate, robust, and useful in decision making. This is a nascent area of research which supports one of AISI's core products: conducting tests of frontier AI systems and feeding back results, insights, and recommendations to model developers and policy makers. The team will be an independent voice on the quality of our testing reports and the limitations of our evaluations. You will collaborate closely with researchers and engineers from the workstreams who develop and run our evaluations, getting into the details of their key strengths and weaknesses, proposing improvements, and developing techniques to get the most out of our results.

The key challenge is increasing the confidence in our claims about system capabilities, based on solid evidence and analysis. Directions we are exploring include:

  • Running internal red teaming of testing exercises and adversarial collaborations with the evaluations teams, and developing “sanity checks” to ensure the claims made in our reports are as strong as possible.
  • Running in-depth analyses of evaluations results to understand successes and failures and using these insights to create best practices for testing exercises.
  • Developing our approach to uncertainty quantification and significance testing, increasing statistical power (given time and token constraints).
  • Developing methods for inferring model capabilities across given domains from task or benchmark success rates, and assigning confidence levels to claims about capabilities.

Predictive Evaluations: The goal is to develop approaches to estimate the capabilities of frontier AI systems on tasks or benchmarks, before they are run. Ideally, we would be able to do this at some point early in the training process of a new model, using information about the architecture, dataset, or training compute. This research aims to provide us with advance warning of models reaching a particular level of capability, where additional safety mitigations may need to be put in place. This work is complementary to both safety cases—an AISI foundational research effort—and AISI’s general evaluations work.

This topic is currently an area of active research, and we believe it is poised to develop rapidly. We are particularly interested in developing predictive evaluations for complex, long-horizon agent tasks, since we believe this will be the most important type of evaluation as AI capabilities advance. You will help develop this field of research, both by direct technical work and via collaborations with external experts, partner organizations, and policy makers.

Across both focus areas, there will be significant scope to contribute to the overall vision and strategy of the science of evaluations team as an early hire. You’ll receive coaching from your manager and mentorship from the research directors at AISI, and work closely with talented Policy / Strategy leads and Research Engineers and Research Scientists.

Responsibilities: This role offers the opportunity to progress deep technical work at the frontier of AI safety and governance. Your work will include:

  • Running internal red teaming of testing exercises and adversarial collaborations with the evaluations teams, and developing “sanity checks” to ensure the claims made in our reports are as strong as possible.
  • Conducting in-depth analysis of evaluations methodology and results, diagnosing possible sources of uncertainty or bias, to improve our confidence in estimates of AI system capabilities.
  • Improving the statistical analysis of evaluations results (e.g. model selection, hypothesis testing, significance testing, uncertainty quantification).
  • Developing and implementing internal best-practices and protocols for evaluations and testing exercises.
  • Staying well informed of the details and strengths and weaknesses of evaluations across domains in AISI and the state of the art in frontier AI evaluations research more broadly.
  • Conducting research on predictive evaluations using the latest techniques from the published literature on AISI’s internal evaluations, as well as conducting novel research to improve these techniques.
  • Writing and editing scientific reports and other materials aimed at diverse audiences, focusing on synthesising empirical results and recommendations to key decision-makers, ensuring high standards in clarity, precision, and style.

Person Specification: To set you up for success, we are looking for some of the following skills, experience and attitudes, but we are flexible in shaping the role to your background and expertise.

  • Experience working within a world-leading team in machine learning or a related field.
  • Strong track-record of academic excellence (e.g. PhD in a technical field and/or spotlight papers at top-tier conferences).
  • Comprehensive understanding of large language models.
  • Broad experience in empirical research methodologies and statistical analysis.
  • Deeply care about methodological and statistical rigor, balanced with pragmatism.
  • Proven track record of excellent scientific writing and communication.
  • Motivated to conduct technical research with an emphasis on direct policy impact.
  • Ability to work autonomously and in a self-directed way with high agency.
  • Bring your own voice and experience but also an eagerness to support your colleagues.

Salary & Benefits: We are hiring individuals at all ranges of seniority and experience within this research unit. The full range of salaries are available below:

  • Level 3 - Total Package £65,000 - £75,000
  • Level 4 - Total Package £85,000 - £95,000
  • Level 5 - Total Package £105,000 - £115,000
  • Level 6 - Total Package £125,000 - £135,000
  • Level 7 - Total Package £145,000

The Department for Science, Innovation and Technology offers a competitive mix of benefits including a culture of flexible working, a minimum of 25 days of paid annual leave, and an extensive range of learning & professional development opportunities.

Selection Process: In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process. Candidates should expect to go through some or all of the following stages once an application has been submitted:

  • Initial interview
  • Technical take home test
  • Second interview and review of take home test
  • Third interview
  • Final interview with members of the senior team

Required Experience: We select based on skills and experience regarding the following areas:

  • Empirical research and statistical analysis
  • Frontier AI model architecture, training, evaluation knowledge
  • AI safety research knowledge
  • Written communication
  • Verbal communication
  • Research problem selection

Additional Information: Successful candidates must undergo a criminal record check and get baseline personnel security standard (BPSS) clearance before they can be appointed. There is a strong preference for eligibility for counter-terrorist check (CTC) clearance.

Research Scientist - Science of Evaluations employer: AI Security Institute

The AI Security Institute is an exceptional employer, offering a unique blend of government mission-driven focus and the agility of a tech start-up. Employees benefit from a culture of flexible working, extensive professional development opportunities, and a commitment to impactful research in AI safety. With a supportive environment that encourages collaboration and innovation, this role provides a chance to contribute to critical advancements in AI governance while enjoying competitive salaries and comprehensive benefits.
A

Contact Detail:

AI Security Institute Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Research Scientist - Science of Evaluations

Tip Number 1

Familiarise yourself with the latest research in AI safety and evaluation methodologies. Being well-versed in current literature will not only help you during interviews but also demonstrate your genuine interest in the field.

Tip Number 2

Network with professionals in the AI safety community. Attend relevant conferences or webinars, and engage with researchers on platforms like LinkedIn. This can provide insights into the role and may even lead to referrals.

Tip Number 3

Prepare for technical discussions by brushing up on statistical analysis techniques and empirical research methodologies. Be ready to discuss how you've applied these in past projects, as this will be crucial for the role.

Tip Number 4

Showcase your ability to communicate complex ideas clearly. Practice explaining your previous research or projects to non-technical audiences, as this skill is highly valued in roles that involve policy impact.

We think you need these skills to ace Research Scientist - Science of Evaluations

Empirical Research Methodologies
Statistical Analysis
Machine Learning Expertise
Large Language Models Understanding
Experimental Design
A/B Testing
Bayesian Inference
Hypothesis Testing
Significance Testing
Uncertainty Quantification
Data Visualisation
Scientific Writing
Technical Communication
Problem-Solving Skills
Collaboration and Teamwork
Adaptability in a Fast-Paced Environment

Some tips for your application 🫡

Understand the Role: Before applying, make sure you thoroughly understand the responsibilities and requirements of the Research Scientist - Science of Evaluations position. Tailor your application to highlight how your skills and experiences align with their mission and objectives.

Highlight Relevant Experience: In your CV and cover letter, emphasise your experience in empirical research, statistical analysis, and any work related to AI safety or machine learning. Use specific examples to demonstrate your expertise and how it relates to the role.

Craft a Compelling Cover Letter: Write a cover letter that not only outlines your qualifications but also conveys your passion for AI safety and governance. Discuss why you want to work at the AI Security Institute and how you can contribute to their goals.

Proofread Your Application: Before submitting, carefully proofread your CV and cover letter for any spelling or grammatical errors. A polished application reflects your attention to detail and professionalism, which are crucial in a research role.

How to prepare for a job interview at AI Security Institute

Showcase Your Research Experience

Be prepared to discuss your previous research projects in detail, especially those related to AI safety and evaluations. Highlight any publications or presentations you've made, as this demonstrates your expertise and commitment to the field.

Understand the Role's Technical Requirements

Familiarise yourself with the specific methodologies and statistical analyses mentioned in the job description. Being able to discuss how you would apply these techniques in practice will show that you are well-prepared and knowledgeable.

Prepare for Technical Proficiency Tests

Expect technical tests as part of the interview process. Brush up on your skills in empirical research methodologies, statistical analysis, and AI model evaluation techniques to ensure you can perform well under pressure.

Communicate Clearly and Effectively

Since the role involves writing and communicating complex concepts, practice explaining your research and findings in a clear and concise manner. Tailor your explanations to suit both technical and non-technical audiences, showcasing your ability to bridge the gap.

Research Scientist - Science of Evaluations
AI Security Institute
A
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>