Cultivating a Mechanistic Interpretability Community

Project summary

We’ve built one of Israel’s most active and impactful research communities focused on understanding and interpreting neural networks. Our uniquely interdisciplinary approach brings together AI researchers, neuroscientists, and world-class cybersecurity experts to push the frontiers of interpretability, control, and robustness in machine learning systems.

Our grassroots community has grown from a small reading group to a 500+ member collective over three years. We've held more than 40 reading sessions and facilitated interdisciplinary research, resulting in publications, collaborations, and a vibrant ecosystem of knowledge-sharing and mentorship.

Our members have initiated bottom-up projects such as jointly taking online courses, collaborating on research projects, and running regular research meetings that provide mentorship, technical assistance, and a space for exchanging ideas. We also funded travel for flying 10 participants to major conferences to broaden their exposure and strengthen global connections.

Our collaborative efforts have already led to several research outputs, including three papers authored by community members:

Interpreting the Repeated Token Phenomenon in Large Language Models (ICML 2025, Itay Yona, Yossi Gandelsman)
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation (preprint, Jonathan Jacobi, Gal Niv)
Threshold Hints for Early Exit in Reasoning Models: Efficient and Dynamic Stopping Conditions (under-review, Amir Sarid)

We are also collaborating on three additional papers with leading Israeli professors tackling core challenges in AI safety.

What are this project's goals? How will you achieve them?

The community already exists - it's active, passionate, and growing. But to sustain its momentum and scale our efforts, especially when it comes to deepening member engagement, we need support. Our goal is to strengthen and expand what we’ve built together: a diverse ecosystem that fosters connection, collaboration, inspiration, knowledge-sharing, mentorship, support, career opportunities, and more, just as reflected in the feedback we’ve recently received.

How will this funding be used?

This funding will enable us to scale our forum and impact:

Hire a part-time coordinator to manage and grow the community
Establish a legal and organizational foundation
Fund in-person meetings and networking events
Expand compute resources for research collaborations
Support member travel to academic conferences

Who is on your team? What's your track record on similar projects?

Itay Yona (ex-DeepMind) – I have founded and led this community for the past three years, organizing research meetings, setting the agenda, leading hackathons and initiatives, applying for grants, and managing team growth. My background spans AI, cybersecurity, adversarial machine learning, interpretability, and neuroscience.

In addition to myself, the core team includes Dan Barzilay, who leads our research mentorship efforts, and Yanir Marmor, who has helped coordinate logistics and partnerships. Together, we’ve built a sustainable structure for running events, guiding research projects, and supporting new members.

What are the most likely causes and outcomes if this project fails?

While our community is resilient and curiosity-driven, the absence of funding would significantly limit our ability to grow, support new talent, and maintain our momentum. Without resources for coordination, compute, and in-person collaboration, we risk slowing (or even stalling) high-potential research at a critical time for AI interpretability and safety.

How much money have you raised in the last 12 months, and from where?

We previously secured $40,000 from Effective Ventures (Long-Term Future Fund), which has supported our activities over the past 18 months and is set to conclude by August 31st. That funding enabled travel to conferences, core event infrastructure, and early-stage research support.