Main points in favor of this grant
Developmental interpretability seems like a potentially promising and relatively underexplored research direction for exploring neural network generalization and inductive biases. Hopefully, this research can complement low-level or probe-based approaches for neural network interpretability and eventually help predict, explain, and steer dangerous AI capabilities such as learned optimization and deceptive alignment.
Jesse made a strong, positive impression on me as a scholar in the SERI MATS Winter 2022-23 Cohort; his research was impressive and he engaged well with criticism and others scholars' diverse research projects. His mentor, Evan Hubinger, endorsed his research at the time and obviously continues to do, as indicated by his recent regrant. While Jesse is relatively young to steer a research team, he has strong endorsements and support from Dan Murfet, David Krueger, Evan Hubinger, and other researchers, and has displayed impressive enterpeneurship in launching Timaeus and organizing the SLT summits.
I recently met Dan Murfet at EAGxAustralia 2023 and was impressed by his research presentation skills, engagement with AI safety, and determination to build the first dedicated academic AI safety lab in Australia. Dan seems like a great research lead for the University of Melbourne lab, where much of this research will be based.
Australia has produced many top ML and AI safety researchers, but has so far lacked a dedicated AI safety organization to leverage local talent. I believe that we need more AI safety hubs, especially in academic institutions, and I see Timaeus (although remote) and the University of Melbourne as strong contenders.
Developmental interpretability seems like an ideal research vehicle to leverage underutilized physics and mathematics talent for AI safety. Jesse is a former physicist and Dan is a mathematician who previously specialized in algebraic geometry. In my experience as Co-Director of MATS, I have realized that many former physicists and mathematicians are deeply interested in AI safety, but lack a transitionary route to adapt their skills to the challenge.
Other funders (e.g., Open Phil, SFF) seem more reluctant (or at least slower) to fund this project than Manifund or Lightspeed and Jesse/Dan told me that they would need more funds within a week if they were going to hire another RA. I believe that this $20k is a high-expected value investment in reducing the stress associated with founding a potentially promising new AI safety organization and will allow Jesse/Dan to produce more exploratory research early to ascertain the value of SLT for AI safety.
Donor's main reservations
I have read several of Jesse's and Dan's posts about SLT and Dev Interp and watched several of their talks, but still feel that I don't entirely grasp the research direction. I could spend further time on this, but I feel more than confident enough to recommend $20k.
Jesse is relatively young to run a research organization and Dan is relatively new to AI safety research; however, they seem more than capable for my level of risk tolerance with $20k, even with my current $50k pot.
The University of Melbourne may not be an ideal (or supportive) home for this research team; however, Timaeus already plans to be somewhat remote and several fiscal sponsors (e.g., Rethink Priorities Special Projects, BERI, Ashgro) would likely be willing to support their researchers.
Process for deciding amount
I chose to donate $20k because Jesse said that a single paper would cost $40k (roughly 1 RA-year) and my budget is limited. I encourage further regrantors to join me and fund another half-paper!
Conflicts of interest
Jesse was a scholar in the program I co-lead, but I do not believe that this constitutes a conflict of interest.