Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

[Retroactive] Funding for developing new "substrate-flexible risk" threat model

Science & technologyTechnical AI safety
mfatt avatar

Matthew Farr

ProposalGrant
Closes March 17th, 2026
$0raised
$10,920minimum funding
$21,840funding goal

Offer to donate

42 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

I am seeking retroactive funding for a research project I have been pursuing, self-funded and part-time for approximately a year now.

The project was initiated at MATS 6.0, and I've continued it as my main research focus. It identified and clarified some implicit assumptions in mech-interp's theory of change. It then demonstrated how these assumptions are likely to fail in future intelligences. It presented a new threat model ("substrate-flexible risks") to the literature.

It has been accepted at multiple conferences and is now part of BlueDot's Technical AI Safety Curriculum.

What are this project's goals? How will you achieve them?

The project's aim was to have an impact on the AI safety portfolio and on the attitudes and tastes of empirical researchers. More explicitly, to raise awareness of the assumptions that underpin mech-interp and highlight a direction in which interpretability might continue.

I believe that this work has had, and will continue to have, substantial impact, for the following reasons:

  • The initial position paper was:

    • Accepted for poster presentation at the Tokyo AI Safety Conference 2025 (I attended with financial assistance from Manifund and a private donor connected to my mentor).

    • Published in the conference proceedings.

  • A second, expanded version of that paper has been:

    • Accepted for publication (forthcoming) in the Proceedings of Odyssey 2025, where it was also presented.

    • Included in BlueDot's Technical AI Safety Curriculum.

    • Presented as a workshop at HAAISS 2025.

How will this funding be used?

The funding sought is retroactive, for work already completed. I have estimated my contribution as equivalent to 1-2 days per week, for a year.

Who is on your team? What's your track record on similar projects?

I am lead author and coordinator of the project. The project forms part of Sahil's broader agenda, and both versions of the paper have received considerable mentorship and writing assistance from him.

Chris Pang was initial co-author with me. For the second iteration of the paper, Aditya Prasad was my main co-author, and the work included contributions from Aditya Adiga and Jayson Amati.

With the exception of Chris, we are all affiliated with Groundless in some capacity.

What are the most likely causes and outcomes if this project fails?

The project has so far been a success. It has been accepted at editor and peer-review conferences and added to the curriculum. It is being amplified and spotlighted in the appropriate places for it to continue to have an impact.

How much money have you raised in the last 12 months, and from where?

I received ~2000 USD to attend the Tokyo AI Safety Conference and present my work. This was majority funded by a private funder, and supported by a Manifund grant.

I am participating in the FIG Fellowship and received ~1370 USD as an honorarium.

CommentsOffers

There are no bids on this project.