Are you also hoping to revive the journal model? Or are you planning to focus exclusively on supporting self-published articles?
If the former, I'm worried that this proposal does not address the points made in the hiatus article against this theory of change. If the latter, I'm worried that jumping straight into tool development might distract you from the broader picture (where are the specific bottlenecks? which use cases do you want to support? how can you get researchers to adopt these workflows?)
As for the technical aspects of the proposal:
Creating such visually rich articles can be accelerated by building AI tools such as:
code completion foundation model (FM) for generating Math animations (using the Manim library created by 3Blue1Brown).
image generation multi-modal FM for going from sketches to diagrams. editing of generated diagrams with user-created masks and text prompts.
Is it really necessary to finetune a new model? Can't you work over the existing Manim capabilities of SOTA LLMs? (i.e. through RAG or creating a self-documenting system prompt). In my brief experience trying to generate Manim code, it seems LLMs make very straightforward mistakes that could be solved by just giving them better context on the library.
As for image generation: are you hoping to get the diagram images as output? Or is it your idea to use the model as part of a pipeline to go from a sketch to some kind of renderable spec? (i.e. Mermaid). Naively, I don't see how these could be reliable with current models, but I might be missing something.