[Retroactive] Funding for developing new "substrate-flexible risk" threat model

Project summary

I am seeking retroactive funding for a research project I have been pursuing, self-funded and part-time for approximately a year now.

The project was initiated at MATS 6.0, and I've continued it as my main research focus. It identified and clarified some implicit assumptions in mech-interp's theory of change. It then demonstrated how these assumptions are likely to fail in future intelligences. It presented a new threat model ("substrate-flexible risks") to the literature.

It has been accepted at multiple conferences and is now part of BlueDot's Technical AI Safety Curriculum.

What are this project's goals? How will you achieve them?

The project's aim was to have an impact on the AI safety portfolio and on the attitudes and tastes of empirical researchers. More explicitly, to raise awareness of the assumptions that underpin mech-interp and highlight a direction in which interpretability might continue.

I believe that this work has had, and will continue to have, substantial impact, for the following reasons:

The initial position paper was:
- Accepted for poster presentation at the Tokyo AI Safety Conference 2025 (I attended with financial assistance from Manifund and a private donor connected to my mentor).
- Published in the conference proceedings.
A second, expanded version of that paper has been:
- Accepted for publication (forthcoming) in the Proceedings of Odyssey 2025, where it was also presented.
- Included in BlueDot's Technical AI Safety Curriculum.
- Presented as a workshop at HAAISS 2025.

How will this funding be used?

The funding sought is retroactive, for work already completed. I have estimated my contribution as equivalent to 1-2 days per week, for a year.

Who is on your team? What's your track record on similar projects?

I am lead author and coordinator of the project. The project forms part of Sahil's broader agenda, and both versions of the paper have received considerable mentorship and writing assistance from him.

Chris Pang was initial co-author with me. For the second iteration of the paper, Aditya Prasad was my main co-author, and the work included contributions from Aditya Adiga and Jayson Amati.

With the exception of Chris, we are all affiliated with Groundless in some capacity.

What are the most likely causes and outcomes if this project fails?

The project has so far been a success. It has been accepted at editor and peer-review conferences and added to the curriculum. It is being amplified and spotlighted in the appropriate places for it to continue to have an impact.

How much money have you raised in the last 12 months, and from where?

I received ~2000 USD to attend the Tokyo AI Safety Conference and present my work. This was majority funded by a private funder, and supported by a Manifund grant.

I am participating in the FIG Fellowship and received ~1370 USD as an honorarium.