Overall: I recommend funding this to at least ~$240K, the level needed for the Seminar + 1-year fellowship.
I researched AGI alignment at MIRI for about 7 years; in my judgement, the field is generally not well set-up to appropriately push newcomers to work on the important difficult core problems of alignment. Personally my guess is that AGI alignment is too hard for humans to solve at all any time soon. But, if I were wrong about that, I would probably still think that novel deep technical philosophy about minds would be a prerequisite. I'm not up to date, so this impression might be partly incorrect, but broadly my belief is that most AI safety training programs are not able to create a context where people have the space, and are spurred, to think about those core problems.
Since this program is new, it's hard to judge. I've worked with Mateusz on alignment research, and I think he gets the problem, and the description of the program seems around as promising as any I've seen. Because the space hasn't found great traction yet, trying new things is especially valuable. So, IF you want to fund AGI alignment research, this should probably be among your top investments.
Further, if you want to fund this program, I'd strongly recommend funding it at least to the minimum bar to continue it with the 1-year fellowship. The reason is that learning to approach the actual AGI alignment problem is a slow process that probably needs multiple years, with sparse but non-zero feedback; so the foundations laid down in the month-long seminar might tend to somewhat go to waste without longer-lasting scaffolding.
stable working pods of three to five people
I would suggest creating space for even smaller groups (the standard in Yeshiva, I gather, is pair study, and personally I need substantial time/space set aside for solo thinking). The area is very strongly inside-view-perspective thirsty, so an admixture of space for those to grow is needed, even given the opportunity cost. You could try to offload that to before and after the program, but I'd suggest also making space for it during. E.g. a "Schelling" time for 2 hour solo walks / thinks, or whatever.
We actually consider it very likely that the project "fails" in the sense that it will complete with none of the Fellows producing any clearly promising research outputs or directions at building pieces of a solution. The reason/cause of this would be that the problem being tackled is one of great difficulty, very slippery, and with difficult feedback loops with reality.
This is an unbelievably based statement, which on the object level would hopefully contribute to making an environment where actual new perspectives (rather than just the Outside the Box Box https://www.lesswrong.com/posts/qu95AwSrKqQSo4fCY/the-outside-the-box-box ) can grow, and furthermore indicates some degree of hopeworthiness of the organizers on that dimension.
participants will share their learning with each other through structured showcases and peer instruction
Sounds cool, but do keep in mind that this could also create a social pressure to "publish or perish" so to speak, leading to goodharting. A not-great solution is to make it optional or whatever; it's not great because it's sort of just lowering standards, and presumably you do want to have people aiming to work hard and do the thing. Maybe there are better solutions, such as somehow explicitly and in common knowledge making it "count for full points" to present on "here's how I have a really basic/fundamental question, and here's how I kept staring at that question even though it's awkward to keep staring at one thing and not have publishable technical results from that, and here's my thoughts in orienting to that question, and here's specifically why I'm not satisfied with some obvious answers you might give". Or something. In other words, alter the shape of the landscape, rather than making it less steep.
Selection criteria for the fellows:
I would suggest somewhat upweighting something like "security mindset", or (in the same blob), something like "really gets that you can have a plausible hypothesis, but it's wrong, and you could have quickly figured out that it's wrong by actually trying to falsify it / find flaws in it, but you probably wouldn't have quickly figured out that it's wrong just by bopping around by default". And/or trying to bop people on the head to notice that this is a thing, though IDK how to do that. This is especially needed because, since we don't get exogenous feedback about the objects in question, we have to construct our own feedback (i.e. logical reasoning about strong minds).