Sean Kwon
Open source agent monitoring tools to detect failures, infinite loops, and unsafe behavior in production AI systems
Dhruv Yadav
Auditing and improving LLM-as-a-judge systems via interpretable aggregation of preferences
A.Abby
181,448 evaluations proving no production AI model reliably maintains corrections. Expanding coverage and pursuing multi pass validation.
Sean Peters
An early-stage AI safety research group based in Sydney, Australia
jason lee harvey
Establishing a sovereign, decentralized Sentry node to audit frontier AI agents for logic escapes ($E_{escape}$) using the Inverted Social Drift ($SD$) metric.
Dmitri Costenco
Anuar Kiryataim Contreras Malagón
Independent behavioral replication of welfare and answer-thrashing findings from Anthropic's April 2026 frontier system card, by a researcher whose prior corpus
Zaelani
18+ preprints across multiple fields, all written on a 2GB RAM phone. $600 removes the only thing standing between me and the next body of work.
Aashka Patel
Redirecting India’s Middle‑Schoolers into AI Safety, Governance, and X‑Risk Work
Jonathan Elsworth Eicher
Ahmed dawoud
An advanced agent that perceives your screen and executes tasks by controlling the mouse, acting as a digital proxy to handle complex work on your behalf.
Linh Le
Rishub Jain
Jessica Pu Wang
Germany’s talents are critical to the global effort of reducing catastrophic risks brought by artificial intelligence.
Lawrence Wagner
A benchmark for studying how failures spread across multi-agent AI systems and whether they can be detected and interrupted in time.
Matei-Alexandru Anghel
A Safety Framework for Evaluating AI Humanity Alignment Through Progressive Escalation and Scope Creep
Pedro Bentancour Garin
Runtime safety, oversight, rollback, and control infrastructure for advanced AI in real-world, high-consequence environments.
Wasim Gadwal
Observability and interpretability toolkit for world models in AI safety and mechanistic interpretability research.
Johan Fredrikzon
Designing a Project Funding Proposal
sung hun kwag
An open-source safety pilot for detecting metric gaming, pseudo-improvement, and oversight evasion