Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
5

Agency and (Dis)Empowerment

Technical AI safety
🌶

Damiano Fornasiere and Pietro Greiner

ActiveGrant
$60,250raised
$60,000funding goal
Fully funded and not currently accepting donations.

Project summary


6 months support for two people, Damiano and Pietro, to write a paper about (dis)empowerment with Jon Richens, Reuben Adams, Tom Everitt, and Victoria Krakovna. This application is suggested by Evan Hubinger.

Project goals


Produce a paper about agency and (dis)empowerment in the context of multi agent causal models.


Its ultimate aim is to offer formal and operational notions of (dis)empowerment. For example, an intermediate step would be to provide a continuous formalisation of agency, and to investigate which conditions increase or decrease agency.

These definitions should be:

  • Conceptually satisfactory: so as to deconfuse ambiguity on the concept;

  • Operationally implementable: both to be able to identify agents from causal models, and to train models (both RL and LLMs) in such a way that they are not disempowering, e.g., they do not reduce the agency of other agents (including us).


We anticipate a thorough study of these forms of (dis)empowerment, including a literature review, motivating examples and threat models, mathematical formalisations, sound and complete criteria to characterise (dis)empowerment, and the study of concrete examples.

How will this funding be used?


- 30k funding support for 6 months for 2 people;

- 10k expenses for 3 or 4 trips to London: rent, flights + transport, food, …;

- 20k taxation.

What is the recipient's track record on similar projects?


Damiano is a PhD student in Math. Pietro is a M.Sc. student in Math.

A first draft of the project, with preliminary results, has been delivered by Victoria Krakovna to Jon Richens and Tom Everitt as part of the selection for Damiano’s research phase of MATS. Jon and Tom agreed to co-supervise the project, and produce a paper out of it. Pietro helped significantly with the deliverable, proofreading it and sharing/discussing ideas. Consequently he joined the project.


For the preliminary results, the scope of the project has been reduced to offer two counterfactual definitions of disempowerment, providing sound and complete criteria to detect them in causal models, and studying how they relate to each other.

When we say "A is (dis)empowered by a policy π_B of B", we understand that both A and B can be agents, groups of agents, or (parts of) the environment itself (so π_B could be a tuple of policies):

  • Disempowerment as loss of decisional power: If B would choose a different policy, A could make a different decision than the best response to π_B, which would yield A greater utility;

  • Disempowerment as the need to cause certain outcomes: A has an incentive to control a set of variables X, A can control the variables X, but A's best response to π_B is constrained w.r.t. X, in the sense that if the outcome of X would be guaranteed regardless of A's actions, then A could pursue a different policy that would be at least as good.

How could this project be actively harmful?


Any mathematical formalisation of a notion that is to be avoided can be used to train models that optimise for that notion. As such, this project also suffers from this problem.

What other funding is this person or project getting?


Pietro is a master student getting no other funding. Damiano gets a salary as a PhD student in Barcelona.

Comments2Donations2Similar8
wiserhuman avatar

Francesca Gomez

Develop technical framework for human control mechanisms for agentic AI systems

Building a technical mechanism to assess risks, evaluate safeguards, and identify control gaps in agentic AI systems, enabling verifiable human oversight.

Technical AI safetyAI governance
3
5
$10K raised
🌶

Damiano Fornasiere and Pietro Greiner

Relocating to Montreal to work full time on AI safety

Science & technologyTechnical AI safety
1
2
$10K raised
🐸

SaferAI

General support for SaferAI

Support for SaferAI’s technical and governance research and education programs to enable responsible and safe AI.

AI governance
3
1
$100K raised
LucyFarnik avatar

Lucy Farnik

Discovering latent goals (mechanistic interpretability PhD salary)

6-month salary for interpretability research focusing on probing for goals and "agency" inside large language models

Technical AI safety
7
4
$1.59K raised
jesse_hoogland avatar

Jesse Hoogland

Scoping Developmental Interpretability

6-month funding for a team of researchers to assess a novel AI alignment research agenda that studies how structure forms in neural networks

Technical AI safety
13
11
$145K raised
kylegracey avatar

Kyle Gracey

AI Policy Breakthroughs — Empowering Insiders

Strategy Consulting Support for AI Policymakers

AI governance
3
1
$20K raised
Dhruv712 avatar

Dhruv Sumathi

AI For Humans Workshop and Hackathon at Edge Esmeralda

Talks and a hackathon on AI safety, d/acc, and how to empower humans in a post-AGI world.

Science & technologyTechnical AI safetyAI governanceBiosecurityGlobal catastrophic risks
1
0
$0 raised
Rubi-Hudson avatar

Rubi Hudson

Avoiding Incentives for Performative Prediction in AI

Technical AI safety
7
6
$33.2K raised