Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
3

Decode Research - Compute for Generating Dashboards & Autointerp

Technical AI safety
decoderesearch avatar

Johnny Lin

Not fundedGrant
$0raised

Project summary

This is a targeted grant for a specific set of data to generate for Decode Research, on Neuronpedia. These are "feature dashboards" and autointerp for a new set of SAEs that we wish to generate. The amount has been pre-agreed upon.

What are this project's goals and how will you achieve them?

This is for a specific compute-heavy task called "feature dashboards" for Decode Research.

We have already generated about 20% of these dashboards, an example of a dashboard is: http://neuronpedia.org/gemma-2-2b/0-gemmascope-res-16k/34

Here's a listing of each SAE that will have dashboards (some are still hidden for now):

https://www.neuronpedia.org/gemma-scope#browse

The overall Gemma Scope project on Neuronpedia is here: https://www.neuronpedia.org/gemma-scope#main

We plan to, in all, generate 40,000,000 dashboards and do auto-interp for them too. Auto-interp is basically asking an AI like GPT4 to interpret what a feature is about.

How will this funding be used?

We will spend all of it on either feature dashboards or auto-interp explanations. We expect to be cost-neutral on this - eg it goes entirely to compute.

Who is on your team and what's your track record on similar projects?

Joseph Bloom, Curt Tigges, Johnny Lin of Decode Research. We have generated ~20% of the dashboards already.

What are the most likely causes and outcomes if this project fails? (premortem)

This will likely not fail unless something really weird happens, like all AI compute resources are suddenly 10x the expected price.

What other funding are you or your project getting?

This is a targeted grant for a specific task. Our project has gotten other funds from various sources, but none of it was delegated for this specific task.

Comments3Similar8
GlenTaggart avatar

Glen M. Taggart

Independent research to improve SAEs (4-6 months)

By rapid iteration on possible alternative architectures & training techniques

Technical AI safety
3
5
$55K raised
hijohnnylin avatar

Johnny Lin

Neuronpedia - Open Interpretability Platform

Platform for interpretability researchers, especially those creating/using Sparse Autoencoders

Technical AI safety
3
9
$2.5K raised
Kunvar avatar

Kunvar Thaman

Exploring feature interactions in transformer LLMs through sparse autoencoders

Technical AI safety
9
4
$8.5K raised
MatthewClarke avatar

Matthew A. Clarke

Salaries for SAE Co-occurrence Project

Working title - “Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces”

Science & technologyTechnical AI safety
3
1
$0 raised
ethanjperez avatar

Ethan Josean Perez

Compute and other expenses for LLM alignment research

4 different projects (finding RLHF alignment failures, debate, improving CoT faithfulness, and model organisms)

Technical AI safety
6
3
$400K raised
jesse_hoogland avatar

Jesse Hoogland

Scoping Developmental Interpretability

6-month funding for a team of researchers to assess a novel AI alignment research agenda that studies how structure forms in neural networks

Technical AI safety
13
11
$145K raised
hzh avatar

Zhonghao He

Mapping neuroscience and mechanistic interpretability

Surveying neuroscience for tools to analyze and understand neural networks and building a natural science of deep learning

Technical AI safety
5
9
$5.95K raised
kacloud avatar

Alex Cloud

Compute for 4 MATS scholars to rapidly scale promising new method pre-ICLR

Technical AI safety
3
5
$16K raised