Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
4

Travel Funding to present research at ICLR2025

Science & technologyTechnical AI safety
GeorgLange avatar

Georg Lange

Not fundedGrant
$0raised

Project summary

My paper has been accepted at ICLR2025 and I seek travel funding to present it there.

We started working on this paper during the MATS 4.1 extension program and continued afterwards on our own savings. Thus, I have no institution or grant that could cover conference attendance.

The paper's topic is technical AI safety and mechanistic interpretability for LLMs. More specifically, Sparse Autoencoders (SAE) is a new method to extract human-interpretable features from LLM representations and could be useful to discover deception or undesired behavior and to enhance model robustness, controllability, and debugging. However, realistic ground-truth evaluations have been lacking.

In the paper, we propose and test a new method to evaluate SAEs and show the following things that might be valuable for further research in technical AI safety:

  • SAE features that have clear interpretation are often not great for control, so researchers should be more cautious when they interpret their results.

  • SAEs trained on real data are much worse than SAEs trained on toy or task-specific data, but some improvements in SAE architecture alleviate some of these issues.

  • We characterize other SAE problems like feature splitting, feature magnitudes, and occlusion

    All together, these improve our ability to realistically evaluate SAE-based interpretability methods, preventing potential misinterpretations of model features, thereby contributing to safer and more robust AI systems.

What are this project's goals? How will you achieve them?

  • Preparing the accompanying poster and showing it at ICLR2025.

  • Presenting the paper to other AI safety researchers who use SAEs in their research and raise awareness for SAE pitfalls and showing them how to use proper evaluations that our paper introduces.

  • As a secondary goal, I am currently looking for opportunities to continue to work in technical AI safety, i.e. am looking for jobs in research labs, non-profits, or academic labs and attending the conference would enable me to network, explore collaborations, and engage directly with other researchers.

How will this funding be used?

  • 900 $ ICLR2025 attendance fee

  • 1050 $ basic economy flights Frankfurt - Singapore

  • 900 $ hotel

Who is on your team? What's your track record on similar projects?

I am applying for this grant alone. The other first author of the paper is Alex Makelov and this project was mentored by Neel Nanda.

I previously presented at other conferences with great engagement, for example at ICLR2024 or SfN. The arxiv preprint of this paper has 28 cites on google scholar.

What are the most likely causes and outcomes if this project fails?

Without funding, I couldn't present the paper at ICLR2025, reducing visibility of our proposed evaluation methods and critical insights into SAE limitations. Consequently, AI safety and mechanistic interpretability researchers might continue to rely on inadequately evaluated SAE features, potentially overstating their reliability and interpretability.

How much money have you raised in the last 12 months, and from where?

I haven't raised money in the last 12 months.

CommentsSimilar7
sheikheddy avatar

Sheikh Abdur Raheem Ali

[Urgent] Travel funding to present research at ICLR and GovAI Summit

Technical AI safetyAI governanceGlobal catastrophic risks
2
0
$0 raised
evzen avatar

Evžen Wybitul

Presenting a poster at the ICML technical AI governance workshop

Technical AI safetyAI governance
1
0
$0 / $2.5K
mfatt avatar

Matthew Farr

[Urgent] Top-up funding to present poster at the Tokyo AI Safety Conference

My allocated travel funding is insufficient. Seeking extra funding for flights, accommodation, etc, to present poster and network

Science & technologyTechnical AI safety
2
1
$700 raised
🌻

Joss Oliver

Travel funding to the International Conference on Algorithmic Decision Theory.

4
2
$1.17K raised
sableye avatar

Jay Luong

Help me to attend a spring school on AI that I am coorganising

Travel/Accommodation support for a lead coorganiser of the Ethos+Tékhnē spring school

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
1
0
$0 raised
LucyFarnik avatar

Lucy Farnik

Discovering latent goals (mechanistic interpretability PhD salary)

6-month salary for interpretability research focusing on probing for goals and "agency" inside large language models

Technical AI safety
7
4
$1.59K raised
🐝

Max Kaufmann

Travel grant to present AI safety paper at ACM FAccT

Technical AI safety
1
1
$1.65K raised