Formation Research

Project summary

Nick Bostrom thought about lock-in in 2014, but there has been almost no empirical work on it since. I’m starting Formation research to minimise lock-in risk, to reduce the likelihood that humanity ends up in a lock-in in the future.

A lock-in is where features of the world, typically negative elements of human culture, are held stable for long periods of time.

A future without lock-in has continued technological and cultural evolution, economic growth, sustainable competition, and individual freedom. Lock-in prevents these properties from being true in the future. Lock-in prevents people from doing anything about it once it happens.

Example 1: an AGI gaining strategic control of humanity, enacting a dystopia where the AGI pursues a goal, preventing human intervention.

Example 2: a human dictator leveraging AI to increase their lifespan and implement worldwide surveillance, creating a stable totalitarian regime.

We believe this work is:

Impactful: lock-in may affect many or all people, and could last trillions of years;( (https://forum.effectivealtruism.org/posts/KqCybin8rtfP3qztq/agi-and-lock-in)

Neglected: no other organisations are working on lock-in;
Tractable: there are concrete actions we can take to reduce lock-in risks, as we demonstrate in our proposed action plan.

Over the grant period, we aim:

For the UK government to include lock-in as a key risk in a publication from DSIT or AISI
To publish at least one paper at a machine learning/autonomous systems conference or journal
Published an evaluation library for LLMs
Published a statistic: the probability of lock-in by year Y

What are this project's goals? How will you achieve them?

The goal of this project is to quantify and minimise lock-in risk – the likelihood that features of the world, typically negative elements of human culture that are harmful or oppressive, are held stable for long periods of time, akin to a totalitarian regime.

We plan to release a technical report making a predictive forecast for the likelihood of lock-in. We’ll use data and methods from AI control, predictive forecasting, and statistical modelling, to create models that quantify lock-in risks. We expect this work to resemble Forecasting TAI with Biological Anchors by Ajeya Cotra.

We will then use this model to develop a lock-in risk evaluation framework and benchmark, starting with large language models. Our objective is to have this framework added to AISI Inspect so that the government can use it to evaluate state of the art models.

We plan to use game theoretic simulations to model multi-agent lock-in scenarios to measure our forecasts and calibrate our statistics. We expect this work to resemble Games for AI Control by Griffin et al (2024).

We plan to publish our results at academic conferences and in technical reports, and to communicate our findings to the UK government to advocate the regulation of AI system architecture we identify as posing a lock-in risk. We also plan to collaborate with other organisations with similar theories of change, such as GovAI and the Centre for Long-Term Risk.

We will regularly review our own effectiveness throughout the 1 year grant period, and seek future funding at the end of the grant period pending the organisation’s success.

How will this funding be used?

1 year of general organisational support, including paying Alfie Lamerton to run the project, and paying for organisation fees such as conference tickets.

If the minimum funding amount is reached, Alfie Lamerton will run the project personally for the grant period. If the funding goal is reached, Alfie will seek to hire an additional researcher in the second half of the grant period pending the organisation's success.

Who is on your team? What's your track record on similar projects?

Alfie Lamerton will direct the project for 1 year. We do not expect to grow the team until at least the end of the funding period, once the organisation is more established. Alfie has previously received a grant for the initial development of this project, and has received a separate grant for previous AI safety research. Alfie has a strong background in computer science and quantitative research.

What are the most likely causes and outcomes if this project fails?

Our predictive modelling fails to make any strong claims about the potential for lock-in; or, our advocacy fails to affect change in the world. The outcome will be that the we explore alternative methods for reducing lock-in risk.

How much money have you raised in the last 12 months, and from where?

£4,000 in organisational support from BlueDot Impact in September 2024.