AI-Plans.com is a rapidly growing platform for feedback on AI Alignment research.
As of January 2024, there are 100+ alignment plans on the site, with 150+ Critiques.
We hold bi-monthly Critique-a-Thon events, for which participation has continued to increase.
It’s extremely useful for many reasons:
- Showcases just how many vulnerabilities there are in all the current alignment plans
- Drastically improves the feedback loop for AI Alignment researchers
- Makes it much easier to contribute to AI Safety research
- Provides credentials for anyone looking to get started in AI Safety (badges and position on leaderboard)
On the site, all alignment plans are scored and ranked from highest to lowest, with new plans always starting at the top. Users vote on the critiques rather than on the plans themselves. Plans are then scored by the sum of the scores of Strength Critiques minus the sum of the scores of Vulnerability Critiques.
We use a karmic voting system which gives more weight to votes cast by more trusted (i.e. more upvoted and less downvoted) users. Users are incentivized with a leaderboard and badges.
The author or poster of a plan can iterate on their plan by selecting critiques to address and creating a new version.
There are several new features coming in a rebuild, which is currently being worked on, including:
- Sub critiques
- Annotations without additional sign-up (Currently, annotating on a post requires signing in with a hypothes.is account)
We also run Critique-a-Thons of AI Alignment plans. See the results of the December Critique-a-Thon here: https://aiplans.substack.com/p/december-critique-a-thon-results
We were the first ones to release detailed Critiques of the recent Superalignment paper by OpenAI and the recent DeepMind papers
AI-Plans.com aims to accelerate AI alignment research via a focused feedback loop of public peer review.
The site is designed to elicit high-quality feedback from an open community of alignment researchers and enthusiasts at a rapid pace. It is easier to write a critique than to develop a plan, and it is easier to vote on a critique than to write one. We leverage this scaling of cognitive effort to produce quantitative and qualitative feedback for researchers. This feedback can also provide insight into the quality of alignment plans produced by various companies and researchers, as well as providing insight into the state of alignment more broadly.
The alpha version of the site is live and has already been useful to multiple alignment researchers. We are currently developing the beta release, which includes a more professional design and richer features. Development is no longer talent-constrained, since 6 developers have joined the team.
How will this funding be used?
1. Paying team members for their work
2. Prize funds for Critique-a-Thons
Who is on your team and what's your track record on similar projects?
We have held multiple Critique-a-Thons, which have been highly successful in generating high-quality critiques. (As a side-effect, these have also output broadly useful documents, such as a list of common alignment plan vulnerabilities and a list of ).
Co-founders include:
Kabir – Director, Writer,
Nathan – Quality Director, Project Manager
Koi – Cybersecurity specialist and highly experienced backend developer
Marvel – highly talented developer (recently won a hackathon with his team)
The project could potentially fail due to poor coordination within the team or a failure to hone the site’s design toward the mission. However, this is extremely unlikely, because the team is coordinating very well and we're extremely wired to make sure we stick to the mission and the site being as useful as possible.
The first Critique-a-Thon was given a prize fund by AI Safety Strategy for $500.
A private donor gave £4000 (~$5000) to the team: $2000 went toward prize funds for the second and third Critique-a-Thons and the remainder went towards funding development of the site.