An online science platform

Science & technology ACX Grants 2024

🥥

Praveen Selvaraj

$0raised

$1,000,000valuation

Longer description of your proposed project

I'd like to resurrect https://distill.pub/, but with AI tools to massively speed up the bottlenecks involved. One of the main reasons they went on hiatus is the sheer amount of time it took them per piece https://distill.pub/2021/distill-hiatus/.

Understanding things would get a massive boost with better visuals across the board and folks would be open to create a lot more if it was much faster/easier to do so. I believe this generalizes across all domains.

Creating such visually rich articles can be accelerated by building AI tools such as:

code completion foundation model (FM) for generating Math animations (using the Manim library created by 3Blue1Brown).
image generation multi-modal FM for going from sketches to diagrams. editing of generated diagrams with user-created masks and text prompts.

Describe why you think you're qualified to work on this

I work in AI research. Apart from having the technical skills to build this, I find myself in desperate need of these tools myself. I find that the best way to convey intuition for what I'm working on to all kinds of people from different backgrounds is to create simple visuals.

Other ways I can learn about you

X profile: https://twitter.com/pravsels
Github project link: https://github.com/pravsels/DistilLM

How much money do you need?

I'm assuming 10,000 - 20,000 should be a good seed amount.

Estimate your probability of succeeding if you get the amount of money you asked for

I'm going for a two-pronged approach. Develop an open-source version of the tools that can be run locally for the tech savvy. Since I'd use this myself, very high probability of me continuing to work on this.

Once the results are useful enough, they will then form the backbone for a resurrected distill-like platform.

Agustín Covarrubias

about 1 year ago

Are you also hoping to revive the journal model? Or are you planning to focus exclusively on supporting self-published articles?

If the former, I'm worried that this proposal does not address the points made in the hiatus article against this theory of change. If the latter, I'm worried that jumping straight into tool development might distract you from the broader picture (where are the specific bottlenecks? which use cases do you want to support? how can you get researchers to adopt these workflows?)

As for the technical aspects of the proposal:

Creating such visually rich articles can be accelerated by building AI tools such as:
code completion foundation model (FM) for generating Math animations (using the Manim library created by 3Blue1Brown).
image generation multi-modal FM for going from sketches to diagrams. editing of generated diagrams with user-created masks and text prompts.

Is it really necessary to finetune a new model? Can't you work over the existing Manim capabilities of SOTA LLMs? (i.e. through RAG or creating a self-documenting system prompt). In my brief experience trying to generate Manim code, it seems LLMs make very straightforward mistakes that could be solved by just giving them better context on the library.

As for image generation: are you hoping to get the diagram images as output? Or is it your idea to use the model as part of a pipeline to go from a sketch to some kind of renderable spec? (i.e. Mermaid). Naively, I don't see how these could be reliable with current models, but I might be missing something.

Agustín Covarrubias

about 1 year ago

@agucova having said all this, I think this project looks promising! I would be really excited to see a growing open-source ecosystem around tools for low-effort, high-quality visualization for explanations and research papers, and there's probably a very large audience of researchers (or communicators) that could adopt these tools if they're framed in the right way.

🥥

Praveen Selvaraj

about 1 year ago

@agucova I guess my goal is to create these tools and make them widely accessible in the easiest to use format, which would be a platform like substack/medium where the tools are built into the post creation flow. I have no idea how I'd make folks use these tools or if things will get to the 'building of a platform' stage.

It's still an open question if the SOTA LLM + RAG would be better or worse than a custom LLM that's finetuned on as much manim resources there are out there + a bunch of synthetically generated prompts/answers + a reward loop where the animations generated per prompt are rated (either by humans or by a multimodal LLM).

While working on the repo at a recent hackathon (more info here), I saw on X that Claude 3 was already good at generated manim code so I figured I should build that workflow first to see how good it is. Its possible that this wrapper + RAG approach might be good enough.

Regarding the diagram workflow, the goal is to build a sketch to image flow which can then be edited further by the user providing mask + text prompts. Kinda similar to this.

Agustín Covarrubias

about 1 year ago

@pravsels Instead of building a platform, you could build tooling that integrates into existing academic publishing workflows. For example, Quarto is already a relatively successful attempt to improve technical publishing workflows, and I imagine you could make tooling that plugs into it (either through an extension or by outputting Quarto). This would leverage the existing community around Quarto and let you focus on what matters.

Agustín Covarrubias

about 1 year ago

@agucova I could also imagine this working with Typst, Jupyter, Pluto, etc.

Anton Makiievskyi

about 1 year ago

Why 1M valuation?

🥥

Praveen Selvaraj

about 1 year ago

@AntonMakiievskyi I set things up and then found out that I can't edit.

🍄

Thomas Smith

about 1 year ago

Took me a hell of a long time to get something workable with Manim last year. I’d ideally like a text to video model but I figure that’s massively out of your pay grade.

🥥

Praveen Selvaraj

about 1 year ago

@tsmythe It's not just that text-to-video models require a lot of compute. There's also a lot of things about them that are still hazy. How long does each generation take ? How would we edit the videos ?

Possible that eventually that's where we're headed, though.