From what I know, AI Safety Careers isn't funding constrained- how would the funding help with this?
Lead at AI-plans.comhttps://ai-plans.com/
$0 in pending offers
Background- DevOps, Admin & Marketing
Values- Humanity, Agency, Truth
Cause prioritization - Alignment- currently focused on organizing the field and making a way to recognize bad ideas.
2 months ago
I think this could be really useful and the folks at Stampy seem to be doing a lot of good work.
Awesome!! Thank you very much!! You might be interested to know, that not only has the event produced many very well thought out critiques and got more people involved and interested in AI Safety- especially in what actually goes into making a plan robust- in the first two days we produced an extremely useful document: https://docs.google.com/document/d/1GQbAnRPvONF8TdQtQuga4WOLk58iNh3tTdsVyGpA4AE/edit?usp=sharing multiple people have talked about how useful and easy to use this document is and often expressing confusion as to why no one has made something like it before!
Great news! Dr Peter S. Park, an AI Safety postdoc at the Tegmark lab has agreed to be a judge!
Excited to say that we have 20 participants for the critique-a-thon so far!
1st to 2nd: Making a list of all the ways alignment plans could go wrong.
We'll put together a master list of potential "vulnerabilities" based on existing research and our own ideas. This will give us a checklist to use when evaluating plans.
3rd to 4th : Matching vulnerabilities to plans
Everyone will pick a few alignment plans to look at more closely. For each plan, you'll label up to 5 vulnerabilities you think could apply and point out evidence from the plan that supports them. Include your level of confidence in each label as a percentage.
5th to 8th : Argue for and against the vulnerabilities.
You'll team up with another participant and take turns, with one defending, the other questioning the vulnerabilities suggested in Step 2. This debate format will help strengthen the critiques. We'll swap sides on the 6th and rotate team member on the 8th.
9th to 10th: Provide feedback on each other's arguments.
Review your partner's reasoning for and against the vulnerability labels. Point out any faulty logic, questionable assumptions, lack of evidence, etc. to improve the critiques.
Step 5- one week of judging:
We'll evaluate submissions and award prizes!
The organizers and outside experts will judge all the critiques based on accuracy, evidence, insight, and communication. Cash prizes will go to the standout critiques that demonstrate top-notch critical analysis
But if folks want to add more, I'd be happy to increase the prize pool. Though, at some point, it might make more sense to pay the researchers who're being judges.
We've already got 13+ attendees with no prize at all and I want to maximize the chances of there being a prize.
It helps that one of the consultants on our team is a highly experienced cybersecurity professional and professor.
Also, I kinda love breaking things and alignment plans are sooo vulnerable!
Excited to say that within hours of announcement, we already have 10 people who've joined the critique-a-thon!
Researchers interested include:
Dr Tom Everett of DeepMind
Dr Dan Hendrycks of Xai
Dr Roman Yampolskiy
Update: Good news!
Kristen W Carlson, an alignment researcher at the Institute of Natural Science and Technology said they like the site! They also said they found several papers on the site, so it seems to already be proving useful!!
A few other researchers have also expressed interest in the site!
Thank you for your comment!
I agree, getting the site used and having good networking is very important!
On that front, there's actually quite a bit of good news! I've been reaching out to researchers for less than a week and there are already 4 alignment researchers who are very interested in the site! One has been posting his plans himself, another has asked me to post their plan for them, one has joined the team (Jonathan Ng) and another is working on a plan they're happy to have on the site when it's done!
Esbren, the head of Apart Research is also very interested in site and I've spoken with the creator of aisafety.careers who wants to integrate with the site.
I also had a call with Kat Woods who said she really wanted the site to exist and seemed to think it would provide something very valuable.
It's been very promising to get a really great reception from almost every alignment researcher I've talked to about this- the two sceptics have been folk who either think alignment is impossible or that it is basically impossible to have any judgement of a plan since we can't test it. Those are very important points, which I am looking into seriously.