Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
3

Introductory resources for Singular Learning Theory

Technical AI safety
mfar avatar

Matthew Cameron Farrugia-Roberts

ActiveGrant
$10,650raised
$10,530funding goal
Fully funded and not currently accepting donations.

Project summary

6-week salary to contribute to foundational resources in the nascent field of "Singular Learning Theory x AI Alignment"

Project goals

Produce a literature review on Singular Learning Theory, as a foundational resource to help orient newcomers to the field.

How will this funding be used?

Salary for Matthew Farrugia-Roberts during the 6 week period (annualized $91,260/year).

What is the recipient's track record on similar projects?

A detailed survey of the literature has already been completed by Matthew as part of his MS thesis, but has yet to be written up. This substantial preparation will enable the work to be completed relatively quickly.

Matthew has published a joint first-author theoretical ML paper in ICML, a top-teir venue, and completed an MS thesis at the University of Melbourne with a mark of 95+, reserved for students 'typically encountered only a handful of times throughout an academic career'.

How could this project be actively harmful?

Singular Learning Theory provides a potential path to better understanding ML systems. Although better understanding of systems can be helpful for safety, it could also lead to insights improving the efficiency of ML training procedures potentially enabling more powerful systems to be trained sooner without a corresponding improvement in alignment. This risk holds for science of deep learning and interpretability methods in general; on balance, the benefits seem to outweigh the risks, but it is important to at least remain aware of the downside.

Singular Learning Theory is a speculative research direction. Foundational resources will enable more people to on-board to it. However, there's a possibility it's a dead-end and these people would have been better spending their time elsewhere. On balance, it seems worth exploring Singular Learning Theory and enabling newcomers to more rapidly on-board should decrease the overall cost to exploring this direction if resources are allocated efficiently.

What other funding is this person or project getting?

No other funding during this period. Matthew was previously receiving an RA salary for a previous project, and will receive an RA salary for a new project after completion of this six week project.

Comments5Donations3Similar8
LawrenceC avatar

Lawrence Chan

Exploring novel research directions in prosaic AI alignment

3 month

Technical AI safety
5
9
$30K raised
🐯

Scott Viteri

Attention-Guided-RL for Human-Like LMs

Compute Funding

Technical AI safety
4
2
$3.1K raised
jesse_hoogland avatar

Jesse Hoogland

Scoping Developmental Interpretability

6-month funding for a team of researchers to assess a novel AI alignment research agenda that studies how structure forms in neural networks

Technical AI safety
13
11
$145K raised
LucyFarnik avatar

Lucy Farnik

Discovering latent goals (mechanistic interpretability PhD salary)

6-month salary for interpretability research focusing on probing for goals and "agency" inside large language models

Technical AI safety
7
4
$1.59K raised
SandyFraser avatar

Sandy Fraser

Concept-anchored representation engineering for alignment

New techniques to impose minimal structure on LLM internals for monitoring, intervention, and unlearning.

Technical AI safetyGlobal catastrophic risks
3
1
$0 / $72.3K
mfar avatar

Matthew Cameron Farrugia-Roberts

AI Safety Reading Group at metauni [Retrospective]

Retrospective support for small virtual reading group on AI safety topics

Technical AI safetyEA Community Choice
1
5
$815 raised
mfatt avatar

Matthew Farr

Collaboration to develop a DAG formalism to express instrumentality

Stipend to upskill under and collaborate with Sahil K and Topos for 4-6 months, seeking to obtain teleological DAGs as the dual of causal DAGs

Technical AI safety
3
2
$0 raised
JaesonB avatar

Jaeson Booker

Jaeson's Independent Alignment Research and work on Accelerating Alignment

Collective intelligence systems, Mechanism Design, and Accelerating Alignment

Technical AI safety
2
0
$0 raised