Approving this project, and making a small donation myself -- I've seen Theia's work before and was excited to see this pop up on Manifund. Thanks @joel_bkr and Charles for making this happen!
I am an independent researcher and language model tinkerer who writes and works on open-source tools, like the repeng steering vector library, for experimenting with & understanding language models. I'd like to dedicate more time to this, as this work is currently a side project, and produce useful artifacts for new techniques, like Model Diff Amplification.
My current interest is Model Diff Amplification (https://www.goodfire.ai/research/model-diff-amplification), and whether it would be possible to build tools to make experimenting with MDA and combining it with other techniques (like steering vectors or on-policy distillation) easier. Additionally, with more funding, I would like to invest more time into steering vector infrastructure, such as by implementing a steering vector plugin for vLLM.
The minimum funding amount ($5k) will cover my time for writing a tutorial on Model Diff Amplification, and writing tooling for use with HuggingFace models along with my logitloom visualization tool (see below.)
The second tier ($10k) will also cover an attempt to implement a plugin for MDA for vLLM, which will enable higher-performance MDA sampling and potentially usage in RL frameworks for, e.g., distilling an MDA policy into a single model. (I believe this will be useful for model behavior research - for example, to produce reward hacking or emergent misalignment model organisms, to then test mitigation techniques on the resulting single model.) If this fails because it is not feasible to implement this using vLLM's interfaces (which seems unlikely) I will put this funding towards steering vector infrastructure or open-ended research.
The third tier ($20k+) will go towards steering vector infrastructure, such as a steering vector plugin for vLLM. There is an existing project implementing steering for vLLM, EasySteer (which uses some code from repeng), but it is poorly documented and relies on a fork of vLLM instead of using the plugin system. (Using a plugin will make the system easier to integrate with other vLLM-using tools, like verifiers.) If this fails because it is not feasible to implement this using vLLM's plugin system, I will put this funding towards contributions to EasySteer, closing open PRs and issues on repeng, and/or open-ended research.
Additional funding will go towards funding more open-ended model behavior research, open-source development, and writing. (If you are interested in funding me in this range and have certain topics you would like me to prioritize that were not otherwise mentioned, please get in touch.) Likely topics: impact of model personas on RL, RL for personas / character training, using steering vectors to better understand model personas, understanding "emergent misalignment", model introspection.
I expect the timeline here to range from 1-2 months for the minimum funding amount, to 6-12 months at the maximum end.
I have been tinkering with LLMs for several years now, and in that time have produced several useful open-source tools and tutorials / blog posts:
https://github.com/vgel/repeng , an open-source library for generating steering / control / concept / persona vectors
A popular (65k clicks) blog post introducing people to how to use these vectors: https://vgel.me/posts/representation-engineering/
Used in various research contexts, e.g. https://arxiv.org/abs/2511.01689
A slew of example notebooks, including for less common use-cases like training model-contrastive vectors that aren't usually covered in other resources: https://github.com/vgel/repeng/tree/main/notebooks
Previously completed a $5k grant for an implementation of steering vectors to llama.cpp: https://x.com/ggerganov/status/1768357345032118715
Ran an IRL repeng workshop at LightHaven
https://github.com/vgel/logitloom , a free and open-source tool for understanding model behavior at the logit level
https://vgel.me/posts/seahorse/ , a widely-read (145k clicks) exploration of why the seahorse emoji causes models to act strangely, using the logit lens (an interpretability technique)
I have previously received a $10k credit grant for cloud compute from Prime Intellect.
Austin Chen
1 day ago
Approving this project, and making a small donation myself -- I've seen Theia's work before and was excited to see this pop up on Manifund. Thanks @joel_bkr and Charles for making this happen!
Joel Becker
2 days ago
I asked Theia to put this proposal up, on the recommendation of my excellent and very well-read colleague Charles Foster.
The thesis here is: get small grants to high-context model tinkerers who are AI risk skeptics -> they take time away from their jobs to do indie research and writing projects -> they blog about what they find -> those blog posts get read by folks within the AIS world and AIS-skeptic world -> everybody converges to truer beliefs.
I'm excited to support Theia as part of a bet on this thesis!