cover image
AI Security Institute

Research Scientist- Safeguards

On site

London, United kingdom

Full Time

12-10-2025

Share this job:

Skills

Monitoring Decision-making Research Training Machine Learning PyTorch benchmarking OpenAI Large Language Models

Job Specifications

About The AI Security Institute

The AI Security Institute is the world's largest and best-funded team dedicated to understanding advanced AI risks and translating that knowledge into action. We're in the heart of the UK government with direct lines to No. 10, and we work with frontier developers and governments globally.

We're here because governments are critical for advanced AI going well, and UK AISI is uniquely positioned to mobilise them. With our resources, unique agility and international influence, this is the best place to shape both AI development and government action.

Team Description

Interventions that secure a system from abuse by bad actors will grow in importance as AI systems become more advanced and integrated into society. The AI Security Institute's Safeguard Analysis Team researches these interventions: we evaluate the protections on current frontier AI systems and research what measures could better secure them in the future. We then share our findings with the frontier AI companies, key UK officials, and other governments - informing their deployment, research, and policy decision-making.

We have published on several topics, including agent misuse, defending finetuning APIs, third-party attacks on agents, safeguards safety cases, and attacks on layered defenses. Some example impacts have been advancing the benchmarking of agent misuse, identifying safeguard vulnerabilities previously unknown to frontier AI companies, and producing insights into the feasibility and effectiveness of attacks and defences in data poisoning and fine-tuning APIs.

In our team, you can also massively advance both research on how to attack and defend frontier AI and governments' understanding of misuse risks, which we see as critical to advanced AI going well.

Role Description

We're looking for researchers with expertise developing and analysing attacks and protections for systems based on large language models or who have broader experience with frontier LLM research and development. An ideal candidate would have a strong record of performing and publishing novel and impactful research in these or other areas of LLM research.

We're primarily looking for research scientists, but we can support staff's work spanning or alternating between research and engineering. The broader team's work includes research - like assessing the threats to frontier systems, performing novel adversarial ML research on frontier LLMs, and developing novel attacks - and engineering, such as building infrastructure for running evaluations.

The team is currently led by Xander Davies and advised by Geoffrey Irving and Yarin Gal. You'll work with incredible technical staff across AISI, including alumni from Anthropic, OpenAI, DeepMind, and top universities. You may also collaborate with external teams like Anthropic, OpenAI, and Gray Swan.

We are open to hires at junior, senior, staff and principal research scientist levels.

Representative projects you might work on

Designing, building, running and evaluating methods to automatically attack and evaluate safeguards, such as LLM-automated attacking and direct optimisation approaches.
Building a benchmark for asynchronous monitoring for signs of misuse and jailbreak development across multiple model interactions.
Investigating novel attacks and defences for data poisoning LLMs with backdoors or other attacker goals.
Performing adversarial testing of frontier AI system safeguards and produce reports that are impactful and action-guiding for safeguard developers.

What We're Looking For

In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process.

Required Experience

The experiences listed below should be interpreted as examples of the expertise we're looking for, as opposed to a list of everything we expect to find in one applicant:

You May Be a Good Fit If You Have

Hands-on research experience with large language models (LLMs) - such as training, fine-tuning, evaluation, or safety research.
A demonstrated track record of peer-reviewed publications in top-tier ML conferences or journals.
Ability and experience writing clean, documented research code for machine learning experiments, including experience with ML frameworks like PyTorch or evaluation frameworks like Inspect.
A sense of mission, urgency, responsibility for success.
An ability to bring your own research ideas and work in a self-directed way, while also collaborating effectively and prioritizing team efforts over extensive solo work.

Strong Candidates May Also Have

Experience working on adversarial robustness, other areas of AI security, or red teaming against any kind of system.
Extensive experience writing production quality code.
Desire to and experience with improving our team through mentoring and feedback.
Experience designing, shipping, and maintaining complex technical products.

What We Offer

Impact you couldn't have anywhere else

About the Company

We’re building a team of world leading talent to advance our understanding of frontier AI and strengthen protections against the risks it poses – come and join us: https://www.aisi.gov.uk/. The AISI is part of the UK Government's Department for Science, Innovation and Technology. Know more