You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
I am seeking retroactive funding for a research project I have been pursuing, self-funded and part-time for approximately a year now.
The project was initiated at MATS 6.0, and I've continued it as my main research focus. It identified and clarified some implicit assumptions in mech-interp's theory of change. It then demonstrated how these assumptions are likely to fail in future intelligences. It presented a new threat model ("substrate-flexible risks") to the literature.
It has been accepted at multiple conferences and is now part of BlueDot's Technical AI Safety Curriculum.
The project's aim was to have an impact on the AI safety portfolio and on the attitudes and tastes of empirical researchers. More explicitly, to raise awareness of the assumptions that underpin mech-interp and highlight a direction in which interpretability might continue.
I believe that this work has had, and will continue to have, substantial impact, for the following reasons:
The initial position paper was:
Accepted for poster presentation at the Tokyo AI Safety Conference 2025 (I attended with financial assistance from Manifund and a private donor connected to my mentor).
Published in the conference proceedings.
A second, expanded version of that paper has been:
Accepted for publication (forthcoming) in the Proceedings of Odyssey 2025, where it was also presented.
Included in BlueDot's Technical AI Safety Curriculum.
Presented as a workshop at HAAISS 2025.
The funding sought is retroactive, for work already completed. I have estimated my contribution as equivalent to 1-2 days per week, for a year.
I am lead author and coordinator of the project. The project forms part of Sahil's broader agenda, and both versions of the paper have received considerable mentorship and writing assistance from him.
Chris Pang was initial co-author with me. For the second iteration of the paper, Aditya Prasad was my main co-author, and the work included contributions from Aditya Adiga and Jayson Amati.
With the exception of Chris, we are all affiliated with Groundless in some capacity.
The project has so far been a success. It has been accepted at editor and peer-review conferences and added to the curriculum. It is being amplified and spotlighted in the appropriate places for it to continue to have an impact.
I received ~2000 USD to attend the Tokyo AI Safety Conference and present my work. This was majority funded by a private funder, and supported by a Manifund grant.
I am participating in the FIG Fellowship and received ~1370 USD as an honorarium.
There are no bids on this project.