Research Agenda — Alignment Project by AISI

Our research goals

In-scope projects will aim to address either of the following challenges:

How can we prevent AI systems from carrying out actions that pose risks to our collective security, even when they may attempt to carry out such actions?
How can we design AI systems which do not attempt to carry out such actions in the first place?

Making substantial breakthroughs in these areas is an interdisciplinary effort, requiring a diversity of tools and perspectives. We want the best and the brightest across many fields to contribute to alignment research, so have organised these priority research areas as a set of discipline-specific questions. We suggest clicking ahead to your specific areas of interest, rather than reading linearly. Sections are roughly ordered from most theoretical to most empirical.

Some of the subfields below have more detail than others about subproblems, recent work, and related work. This should not be read as a signal about which areas we believe are more important: much of the variance is due to areas we or our collaborators have focused on to date. We want to bring other areas up to similar levels of detail, and will attempt to do this in future versions of this agenda.

We’re excited about projects that tackle these questions, even if they aren’t focused on a problem outlined below. Feel free to look at others’ lists and overviews — e.g. Google DeepMind, Anthropic, or Redwood Research — for ideas. If you see connections between your research and these challenges, we encourage you to submit a proposal.

Research areas broken down by discipline

Our priority research areas

Our research goals

Research areas broken down by discipline

Information Theory and Cryptography

Computational Complexity Theory

Economic Theory and Game Theory

Probabilistic Methods

Learning Theory

Evaluation and Guarantees in Reinforcement Learning

Cognitive Science

Interpretability

Benchmark Design and Evaluation

Methods for Post-training and Elicitation

Empirical Investigations Into AI Monitoring and Red Teaming