At a Glance
- Tasks: Conduct cutting-edge research on AI evaluation methods and collaborate with top experts.
- Company: Join the world's leading team focused on AI security and governance.
- Benefits: Competitive salary, generous leave, remote work flexibility, and professional development opportunities.
- Why this job: Make a real impact on AI safety and governance while working with influential leaders.
- Qualifications: Experience in ML or evaluation science; strong analytical skills and creativity.
- Other info: Dynamic environment with significant autonomy and career growth potential.
The predicted salary is between 65000 - 145000 £ per year.
About the AI Security Institute
The AI Security Institute is the world's largest and best-funded team dedicated to understanding advanced AI risks and translating that knowledge into action. We’re in the heart of the UK government with direct lines to No. 10 (the Prime Minister's office), and we work with frontier developers and governments globally. We’re here because governments are critical for advanced AI going well, and UK AISI is uniquely positioned to mobilise them. With our resources, unique agility and international influence, this is the best place to shape both AI development and government action.
About the Team
AISI's Science of Evaluation team develops rigorous techniques for measuring and forecasting AI capabilities, ensuring evaluation results are robust, meaningful, and useful for governance. Evaluations underpin both scientific understanding and policy decisions about frontier AI. Yet current methodologies are poorly equipped to surface what matters most: underlying capabilities, dangerous failure modes, forecasts of future performance, and robustness across settings. We address this gap by stress-testing the claims and methods in AISI’s testing reports, improving evaluation methods, and building new analytical tools. Our research is problem-driven, methodologically grounded, and focused on impact.
- Methodological red teaming: Independently auditing evidence and claims in evaluation reports shared with model developers.
- Consulting partnerships: Collaborating with AISI evaluation teams to improve methodologies and practices.
- Targeted research bets: Pursuing foundational work that enables new insights into model capabilities.
New research agenda focus (in addition to core team responsibilities): Frontier agents increasingly use massive inference budgets on complex, long-horizon tasks. This makes measuring model horizons, estimating performance ceilings, and maintaining research velocity harder and more expensive. We're developing evaluation methods that remain informative as task budgets exceed 10M+ tokens per attempt and model horizons surpass the longest available tasks.
Role Summary
This research scientist role focuses on evaluation methods for frontier AI, with emphasis on long-horizon agents and inference-compute scaling. You’ll design and conduct experiments that extract deeper signal from evaluation data, uncovering underlying capabilities. You’ll collaborate with engineers and domain experts across AISI and with external partners. Researchers on this team have substantial autonomy to shape independent agendas, and push the frontier of what evaluations can reveal.
Example Projects
- Develop methods to forecast long-horizon performance under increasing inference budgets, including predictive models based on task and model characteristics.
- Design approaches that preserve observability when agents exceed available task lengths (e.g., proxy measurements, task decomposition, data acquisition strategies).
- Support evaluation suite design for improved coverage, predictive validity, and robustness.
- Engineer tools for quantitative transcript analysis to identify failure modes and capability signals.
Responsibilities
- Applied research on evaluation methodology, including new techniques and tools.
- Run and analyze evaluation results to stress-test claims, characterize model capabilities, and inform policy-relevant reports.
- Track the state of the art in frontier AI evaluation research across AISI and externally, and contribute to AISI's presence at ML conferences.
- Long-horizon / inference scaling focus: Design and run experiments that are more informative than end-to-end pass/fail metrics.
- Develop and engineer approaches to long-horizon task design, including automation and internal structure (checkpoints, bottlenecks, progress metrics).
- Estimate capability upper bounds by identifying measurable bottleneck skills relevant to long-horizon performance.
Person Specification
We’re flexible on exact background and expect successful candidates to meet many (but not necessarily all) criteria below. Depending on experience, we’ll consider candidates at Research Scientist or Senior Research Scientist level. We also welcome applications from earlier-career researchers (2–3 years of hands-on LLM experience) who demonstrate creative and rigorous empirical instincts.
- Strong track record in applied ML, evaluation science, or experimental fields with significant methodological challenges (e.g., PhD in a technical field, publications at top-tier venues such as ICML, NeurIPS, or substantial real-world deployments).
- Significant hands-on experience with LLMs and agents.
- Strong motivation for impactful work at the intersection of science, safety, and governance.
- Self-directed and adaptable; comfortable with ambiguity in a growing team.
Nice to Have
- Task design and validation experience (checkpoints, verifiers, progress metrics).
- Transcript analysis or behavioral measurement.
- Experimental design or measurement tooling from other disciplines (psychometrics, behavioral economics).
Core Logistical Requirements
- You should be able to spend at least 4 days per week on working with us.
- You should be able to join us for at least 18 months.
- You should be able to work from our office in London for parts of the week, but we provide flexibility for remote work.
What We Offer
- Impact you couldn't have anywhere else.
- Incredibly talented, mission-driven and supportive colleagues.
- Direct influence on how frontier AI is governed and deployed globally.
- Work with the Prime Minister’s AI Advisor and leading AI companies.
- Opportunity to shape the first & best-resourced public-interest research team focused on AI security.
Resources & access
- Pre-release access to multiple frontier models and ample compute.
- Extensive operational support so you can focus on research and ship quickly.
- Work with experts across national security, policy, AI research and adjacent sciences.
- If you’re talented and driven, you’ll own important problems early.
- 5 days off learning and development, annual stipends for learning and development and funding for conferences and external collaborations.
- Freedom to pursue research bets without product pressure.
- Opportunities to publish and collaborate externally.
Life & family
- Modern central London office (cafes, food court, gym), or where applicable, option to work in similar government offices in Birmingham, Cardiff, Darlington, Edinburgh, Salford or Bristol.
- Hybrid working, flexibility for occasional remote work abroad and stipends for work-from-home equipment.
- At least 25 days’ annual leave, 8 public holidays, extra team-wide breaks and 3 days off for volunteering.
- Generous paid parental leave (36 weeks of UK statutory leave shared between parents + 3 extra paid weeks + option for additional unpaid time).
- On top of your salary, we contribute 28.97% of your base salary to your pension.
- Discounts and benefits for cycling to work, donations and retail/gyms.
Selection Process
In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process. The interview process may vary candidate to candidate, however, you should expect a typical process to include some technical proficiency tests, discussions with a cross-section of our team at AISI (including non-technical staff), conversations with your workstream lead. The process will culminate in a conversation with members of the senior team here at AISI. Candidates should expect to go through some or all of the following stages once an application has been submitted:
- Initial interview.
- Technical take home test.
- Second interview and review of take home test.
- Final interview with members of the senior leadership team.
Research Scientist – Science of Evaluation employer: AI Security Institute
Contact Detail:
AI Security Institute Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Research Scientist – Science of Evaluation
✨Tip Number 1
Network like a pro! Reach out to people in the AI and evaluation fields on LinkedIn or at conferences. A friendly chat can open doors that applications alone can't.
✨Tip Number 2
Show off your skills! Prepare a portfolio or a presentation that highlights your past research and projects. When you get the chance, share it during interviews or networking events.
✨Tip Number 3
Stay updated on the latest trends in AI evaluation. Follow relevant blogs, podcasts, and journals. This knowledge will help you stand out in conversations and interviews.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining our team.
We think you need these skills to ace Research Scientist – Science of Evaluation
Some tips for your application 🫡
Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience in evaluation methods and AI. We want to see how your skills align with our mission at the AI Security Institute, so don’t hold back on showcasing relevant projects!
Show Your Passion: Let your enthusiasm for AI safety and governance shine through in your application. We’re looking for candidates who are not just qualified but also genuinely excited about making an impact in this field.
Be Clear and Concise: When writing your application, keep it straightforward and to the point. We appreciate clarity, so avoid jargon unless it’s necessary. Make it easy for us to see why you’re a great fit for the role!
Apply Through Our Website: Don’t forget to submit your application through our official website! It’s the best way to ensure we receive your materials and can review them promptly. Plus, it shows you’re serious about joining our team.
How to prepare for a job interview at AI Security Institute
✨Know Your Stuff
Make sure you’re well-versed in the latest trends and methodologies in evaluation science and AI. Brush up on your knowledge of long-horizon agents and inference-compute scaling, as these are key areas for the role. Being able to discuss recent papers or breakthroughs will show your passion and expertise.
✨Prepare for Technical Questions
Expect some technical proficiency tests during the interview process. Review your past projects and be ready to explain your methodologies and results clearly. Practising how to articulate complex concepts in a straightforward manner can really help you stand out.
✨Show Your Collaborative Spirit
This role involves working closely with engineers and domain experts. Be prepared to discuss your experience in collaborative environments and how you’ve contributed to team projects. Highlight any partnerships or consulting experiences that demonstrate your ability to work well with others.
✨Ask Insightful Questions
At the end of the interview, don’t forget to ask questions! Inquire about the team’s current projects, challenges they face, or how they measure success. This shows your genuine interest in the role and helps you gauge if it’s the right fit for you.