Seer

What are this project's goals? How will you achieve them?

Make interpretability research on/with agents faster and less painful — reducing the gap between having a hypothesis and testing it
Enable researchers to bring their own techniques (SAEs, steering, activation extraction, etc.) into agentic workflows without rebuilding infrastructure each time
Support replication and extension of existing interp papers by letting agents clone and operate inside paper repos directly
Achieve this through a small, hackable open-source library built on Modal (for remote GPU sandboxes) and an IPython kernel execution layer for observability and iterative work

How will this funding be used?

Dedicated development time to add features, improve stability, and expand provider support (already added Gemini and OpenAI alongside Claude)
Modal compute costs for running and validating experiments
Documentation and tutorials to help the interp research community onboard quickly
Outreach to MATS scholars and other researchers to gather feedback and drive adoption

Who is on your team? What's your track record on similar projects?

Primary author: Arya Jakkli, with 4 additional contributors (thienkhoi01, jnward, angkul07, mbiss10)
Already shipped working case studies: hidden preference investigation, checkpoint diffing with SAEs, introspection experiment replication on Gemma 3 27B, Petri-style auditing agent
135 GitHub stars and 9 forks within ~5 months — organic traction from the interp community
Built on and acknowledged by Goodfire (Scribe) and Modal; feedback from MATS scholars

What are the most likely causes and outcomes if this project fails?

Low adoption: researchers stick with ad-hoc Claude Code + notebook setups; Seer's abstractions don't fit their workflows → project goes unmaintained, no harm beyond lost time
Contributor burnout: small team, currently 1 primary author; without funding, development stalls before the library reaches a stable, widely-used state
Infrastructure dependency risk: heavy reliance on Modal means if Modal's pricing or API changes, significant rework would be needed
Superseded: a larger lab ships similar tooling with more resources → Seer becomes redundant, though early experiments and design decisions may still inform better tools

How much money have you raised in the last 12 months, and from where?

No external funding raised to date
Modal provided compute credits (new accounts receive $30 USD, likely additional credits given usage)
This grant would be the first formal funding for the project

Offer to donate