Compute and infrastructure costs
Ihonwa Omorogieva Sylvester
Making education worthwhile in Nigeria for teachers using AI
6-month funding for a team of researchers to assess a novel AI alignment research agenda that studies how structure forms in neural networks
Help us support more research scholars!
Hire 3 additional AI safety research engineers / scientists
Trajectory Models and Agent Simulators
6-month salary for interpretability research focusing on probing for goals and "agency" inside large language models
1.9 FTE for 9 months to pilot a training program in Manila exclusively focused on Mechanistic Interpretability
Ethan Josean Perez
4 different projects (finding RLHF alignment failures, debate, improving CoT faithfulness, and model organisms)
Six-month support for a Program Manager to organize and execute international AI safety hackathons with Apart Research
Damiano Fornasiere and Pietro Greiner
6 months support for Damiano and Pietro to write a paper about (dis)empowerment with Jon Richens, Reuben Adams, Tom Everitt, and Victoria Krakovna.
Proving Computational Hardness of Verifying Alignment Desirata
Allow Berkeley PhD student to devote more time and focus to AI safety research and mentorship.
We're a team of SERI-MATS alumni working on interpretability, seeking funding to continue our research after our LTFF grant ended.
Matthew Cameron Farrugia-Roberts
Literature review or other introductory material for Singular Learning Theory targeted at AI alignment applications
AI Safety lab focusing on technical alignment and governance of AI in Africa and the Global South more broadly. We are a grassroots community-led research lab
Play the Game. Rack Up Points. Save Humanity?
What is the probability that AGI is developed by January 1, 2043?
Advances in LLMs are a National Security Risk
By Marius Hobbhahn
By Froolow. Parameter Uncertainty in AI Future Forecasting
By TD_Pilditch. Uncertainty, Structure, and Optimal Integration.
By David Johnston. Why generative models handle seemingly ill-posed problems
By Trevor Klee
By Boaz Barak & Ben Edelman
By Kiel Brennan-Marquez
By Peter S. Park. It is plausible that AGI safety research should be assumed compromised once it is posted on the Internet.
Evaluating Multipolar Cooperation Failures Between Autonomous Language Model Systems
Funding to establish a safety and interpretability lab within the Torr Vision Group (TVG) at Oxford
Siao Si Looi
12 months funding for 3 people to work full-time on projects supporting AI safety efforts
3 month salary for AI safety work on deconfusion and technical alignment.
1-year salary for independent research to investigate how LLMs know what they know.
9-month university tuition support for technical AI safety research focused on empowering AI governance interventions.
A scalable, non-infohazardous way to quickly upskill via digestible, repeatable exercises from papers and workshops.
An association for interdisciplinary interest in AI
We build a scalable "Automated Circuit Discovery" method and investigate "Cleanup Behavior" to advance the interpretability of transformer models.
Aryeh L. Englander
Continuation of a previous grant to allow me to pursue a PhD in risk and decision analysis related to AI x-risks
Collective intelligence systems, Mechanism Design, and Accelerating Alignment
Making a simple, easy to read platorm, where alignment plans and their criticisms can be seen and ranked. Currently in Stage 1.
Miguelito De Guzman
6 months of work: Evaluating a variant of GPT2-XL that can simulate a shutdown activation, aiming to improve alignment theory & develop interpretability tools.