AI Safety research is currently an "ivory tower" domain. While models are being deployed to millions, the technical understanding of how they fail (alignment, reward hacking, deceptive behavior) is restricted to a small circle of PhDs. We lack the scalable safety infrastructure to educate the developers and policy-makers who will govern these systems.
I have already launched aiunderstanding.org, a live platform with 130+ technical guides. I am now building "Safety Playgrounds": a suite of interactive, LLM-powered simulations. Unlike static text, these playgrounds require users to interactively "jailbreak" or "misalign" sandboxed models to understand their failure modes in real-time.
Minimum Funding ($50,000): This covers 6 months of my full-time development stipend, core infrastructure scaling, and the launch of the first 10 "Safety Playground" modules.
Funding Goal ($100,000): This allows us to scale significantly: hiring a part-time technical researcher to ensure modules match the latest frontier research, and providing massive API/Compute credits to keep the labs free and open to a global audience.
Phase 1 (Months 1-2): Launch simulations for Reward Hacking, Prompt Injection, and Goal Misgeneralization.
Phase 2 (Months 3-4): Reach 10,000 monthly unique users; launch "Safety Certification" for developers.
Phase 3 (Month 6): Open-source the simulation framework to allow other labs to contribute their own demos.