@Austin Got it. Thanks for the feedback!
Impactful giving,
efficient funding.
Manifund offers charitable funding infrastructure designed to improve incentives, efficiency, and transparency.
Austin Chen
4 days ago
(responding here to a Discord DM, to provide public feedback)
Congrats on the selection! I'm not very familiar with this area, and this writeup, the Axoniverse website and pitch video don't do a great job of explaining what your overall plans are or why you'd be well qualified to execute on them. For those reasons I'm declining to fund this at this time, but do let us know how the 5x5 pitch goes and if you end up winning the grant, I'd be curious to learn more!
Vishnu Muthyala
4 days ago
Update: I got selected to pitch for 5x5 night. It's a $5,000 grant and will be used for the hardware. The pitch needs to be in person, so I have to be in Michigan and need funds for that.
Remmelt Ellen
4 days ago
Many thanks to the proactive donors who supported this fundraiser! It got us out of a pickle, giving us the mental and financial space to start preparing for edition 10.
Last week, we found funds to cover backpay plus edition 10 salaries. There is money left to cover some stipends for participants from low-income countries, and a trial organiser to help evaluate and refine the increasing number of project proposals we receive.
That said, donations are needed to cover the rest of the participant stipends, and runway for edition 11. If you could continue to reliably support AI Safety Camp, we can reliably run editions, and our participants can rely on having covered some living costs while they do research.
P.S. Check out summaries of edition 9 projects here. You can also find the public recordings of presentations here.
Lawrence Chan
7 days ago
See my progress update here: https://manifund.org//projects/exploring-novel-research-directions-in-prosaic-ai-alignment?tab=comments#796ba4d9-f6e7-441f-9858-db4ce03a56a2
$30k -- salary + taxes
(My compute spend was provided by Constellation and FAR AI.)
Lawrence Chan
7 days ago
(Un)fortunately, a lot of the research areas I was interested in exploring have become substantially more mainstream since I wrote the research proposal. For example, Stephen Casper and collaborators have put out their latent adversarial training paper, FAR has completed their work on adversarial example/training scaling laws for transformers, and many at Anthropic and other labs are investing significant amounts of time and resources into adversarial training and related areas.
Instead, I've done work trying to lay the theoretical framework behind ideas in mechanistic interpretability. While there's been a lot of incremental empirical work on improving SAEs or applying SAEs to larger models, there's many theoretical questions in interp that are much more neglected. Specifically, over the course of the grant, I've worked on and completed the following two projects:
Compact Proofs and Mechanistic Interp: There's been a small but steady amount of ongoing discussion on methods to evaluate circuits or mechanistic explanations in general. On one end, we have sampling-based methods like causal scrubbing, and on the other end, we have proofs. So it's natural to explore the question -- can we use proof length/quality to evaluate the degree of mechanistic undertanding? Can we even write nonvacuousproofs about model behavior at all? We've completed preliminary work showing that the answer to both questions is yes on small max-of-K transformers: blog post, paper.
Models of Computation in Superposition: There's an implicit model of computation in superposition that people in mech interp seem to rely on, where models are using superposition to approximately represent a sparse boolean circuit. In contrast, the standard toy models of superposition are Anthropic's TMS, which focuses on representational superposition, and Scherlis et al's model involving quadratic forms, both of which only consider superposition occurring at a single layer. With some collaborators, we've built out a model of superposition that is closer to the implicit model, where superposition both allows for more compact computation of larger circuits, and where superposition can be maintained across many layers (paper).
I'm deciding between returning to METR, my PhD, or other job opportunities.
In case I do pursue a job that allows me to do related research, I'll probably follow up to the projects as follows:
Proofs and Mech Interp: I don't believe that formal proofs will scale to frontier models, but working on the project has made me more convinced of the feasibility of building formal-ish systems for doing mech interp. A natural follow up would be ARC Theory's heuristic arguments work (as laid out in Jacob's recent blog post), which does neatly go around one of the main issues with scaling proofs. I'd probably work on empirical work applying heuristic arguments on transformer models.
Models of Superposition: Over the course of this work, I've become convinced that the model of networks as computing sparse boolean circuits is incorrect. Instead, it seems like every sort of interesting sort of computation requires inhibition or competition between features, and sparse boolean circuits do not allow for inhibition. I think building a model of how inhibition works for features in superposition is the natural next step. (Relatedly, see this post by Redwood on how inhibition allows for substantially more representation power in a one-layer attention transformer than "just" skip-trigrams.)
I'm pretty confused what to do career wise -- any advice would be appreciated.
Florent Berthet
8 days ago
@Haiku Hi Nathan, yes you can support CeSIA by giving to EffiSciences (CeSIA's current legal structure until we register it as a separate non-profit) using the following link : https://www.helloasso.com/associations/effisciences/formulaires/1/en
If your name appears in your donation, I will know to allocate all the funds to CeSIA. If not, feel free to reach out to me at florent@securite-ia.fr to confirm how much and when you donated.
And thanks a lot for your support, this is really helpful! ❤️
Nathan Metzger
8 days ago
This Manifund Project had a high minimum funding amount, a short fuse, and very low public visibility. Is there another way to support CeSIA, since I was unable to do so via this Project?
Yaniv Ben-Ami
11 days ago
I believe in Brandon. His book is going to be very very important and shape how people raise their kids for a lot of people and for a long time. It's long-range on the children's side, but it's immediate needs on parents side. Every dollar gives him breathing room and improves the quality of the end product.
Nathan Metzger
13 days ago
The AI Action Summit will be held in France in February, which makes France a strategically valuable country for communication about AI risk. I believe efforts there are severely underfunded with respect to their potential impact.
Luan Rafael Marques de Oliveira
15 days ago
At the beginning of June, I finished the translation of the Alignment Curriculum. It is published open access on the Brazilian website 80000horas.com.br:
https://80000horas.com.br/alinhamento-da-ia-um-curso-introdutorio/
We also have a translation of the Governance Curriculum:
https://80000horas.com.br/fundamentos-de-seguranca-de-ia/governanca-da-ia-um-curso-introdutorio/
Unfortunately, we don’t have many great prospects for putting this material in good use at least in the next semester, although we’ve had some hopeful developments:
-Our project of an AI Safety Fundamentals virtual programme in Portuguese hasn’t got off the ground yet.
-One study group (at the University of São Paulo - Ribeirão Preto) seems to be currently inactive, and another (at Getúlio Vargas Foundation - Rio) is currently facing some uncertainties about how to proceed for the next semester.
-Fortunately, a formerly inactive group at the University of São Paulo - São Paulo is coming back to life and they expressed interest in the material.
-At the end of June, we organized a small, experimental local EA conference here in Brazil, and it was quite successful (around 200 people showed up). Among its main subjects was AI Safety, and that might get some people interested in forming more groups in future.
https://forum.effectivealtruism.org/posts/J5miiEvm72mPCr8uk/announcing-ea-brazil-summit
Austin Chen
16 days ago
@Klingefjord awesome to hear that you're already thinking about this; agreed on tradeoffs you mentioned. Let me know if/when you're looking for investment too -- Manifund allows our regrantors to make for-profit investments in addition to grants!
Francis Dierick
16 days ago
@Austin I did just that. Got a first demo running on Discord. Details updated in proposal.
Oliver Klingefjord
16 days ago
@Austin Thank you!
Just wanted to address your hesitation – in addition to the paper, we're planning a for-profit spin out structure. We want to do that well so as not to be co-opted by perverse incentives, or forced into building something "less than what could be", but it's part of the plan.
Personally, I think startups are unparalleled execution vehicles but (sometimes) poor research vehicles. For-profit incentives and/or investor pressure can lock you down prematurely. Conversely, non-profit research orgs (as us) are able to pursue open-ended questions freely, but limited in scaling and execution power.
Austin Chen
17 days ago
I'm funding this project as it features a strong team, endorsements from many folks I respect (including Liv Boeree, Tim Urban, CIP and OpenAI), and investigates an angle that I'm personally interested in. I'm kind of a sucker for anything flavored "AI x Markets" (see also: AI Objectives Institute), and think that there's a large shortcoming in current social technology for understanding and fulfilling human preferences.
My biggest hesitation is: I'm baseline skeptical of a primary goal of producing research papers, aka I'm biased towards startups over research orgs. So for instance, I'd be more excited than I already am if the goal was more like "build an LLM coordinator that 200 people find useful and are willing to pay for" -- so produce a tool that people use and are excited to keep using. On a very quick read it seems like MAI's work on this project, if successful, could be extended in that direction? Like, if I was personally convinced an LLM coordinator could help me spend money better than I would normally, I should be willing to pay money for that service.
Tassilo Neubauer
17 days ago
This is one of the few agendas that give me [glimpses of hope](https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy#comments). I don't have that much money to spare at the moment, but I just got a job and I can't believe this still has so little funding, so 10% of my income so far might as well go here.
The only reason I hesitated to donate is that I still feel confused about the capability externalities of most work on interpretability.
Austin Chen
18 days ago
@c1sc0 Yup, you're fine to keep editing the proposal! Consider adding a changelog at the end to briefly document what and when things changed.
Francis Dierick
18 days ago
Thanks for the encouragement @Austin ... I'm definitely moving forward with this, quick question: can I keep editing the proposal until the deadline? I'd like to do weekly progress updates. @anyone interested : for the nitty-gritty details & daily progress check out my notes on c1sc0.me
Austin Chen
19 days ago
Hey Francis, thanks for proposing this project. I appreciate your background with software and Chalk Rebels, as well as the work-in-public ethos you have with your Obsidian notes -- eg it was cool to read more on your thoughts on AI Arena here. You also seem to be moving quickly, though it's only been a few days since you started on this project.
Since it's so early, I'm going to hold off on personally offering funding for a couple of weeks, but will be excited to read about any updates you have to share, especially if you find that people are starting to participate in the challenges!