In my evals field-building efforts, I recently asked people who want to build something in evals to fill out a simple form (https://www.lesswrong.com/posts/uoAAGzn4hPAHLeG2Y/which-evals-resources-would-be-good)
This project was, in my opinion, the second best (after James project: https://manifund.org/projects/more-detailed-cyber-kill-chain-for-ai-control?tab=comments). My thinking is roughly:
1. The project itself sounds very reasonable to me. I find it very obvious that modeling capabilities with latent variable models is something clearly worth trying. I think that, in the best case, their efforts will be as impactful as observational scaling laws (https://arxiv.org/abs/2405.10938). In the worst case, the method is too hard to get working in practice or doesn't provide any clear benefit over other simple analysis tools. In expectation, I think this research will not be as impactful as observational scaling laws (but it has the chance to, which is already a high bar) but will be useful to a decent number of evaluators like Apollo, METR, AISIs, etc. Furthermore, I think that the project will be conceptually valuable because it partially forces you to think about which components this Bayesian model should have and partially because I expect the findings to be insightful. Concretely, I expect the analysis of the latent variables to be quite interesting.
2. I don't know Laurence or Desi personally but their work so far seems reasonable at a first glance and they are clearly more experienced researchers than the average MATS scholar.
3. The project salaries are already covered. This grant is merely for compute. Thus, it seems more impactful on the margins. I expect that a lot of the most interesting results come from runs with lots of compute, so this grant might unlock "the most interesting stuff".
4. Finally, I think science of evals is perfectly suited for academics. It often doesn't require access to the biggest models; many of the classic academic research skills directly transfer, and the results are useful for the entire field. Thus, I have a general intuition that we should try harder to fund academics to do more work on the science of evals and I'm surprised that this isn't happening more yet.