Manifund foxManifund
Home
Login
About
People
Causes
HomePeopleLoginCausesCreate
🌴

Fazl Barez

10

Build an AI Safety Lab at Oxford University

Technical AI safety
$0raised of $50000 goal
08/10/2023date created
Sign in to contribute

Project summary

I have been offered to help set up a safety and interpretability lab within the Torr Vision Group (TVG) at Oxford. This grant provides 4 months of funding to set up the lab, define strategy, coordinate with collaborators, and define the research goals.

With this, we develop Oxford University's first AI safety research lab within the Torr Vision Group (TVG), under the guidance of Prof. Philip Torr and Dr. David Kruger. With a new lab, we aim to set new industry standards within AI safety and interpretability.

What are this project's goals and how will you achieve them?


The project’s overall aim is to establish an AI safety research lab within the Torr Vision Group. For this project to be successful over the long run, we have three primary aims:  

  • Establishment: Build a lab dedicated to AI safety and interpretability at the University of Oxford's Department of Engineering Sciences.

  • Research Agenda: Devising a robust research agenda, leveraging the expertise of lab associates David Kruger and Phil Torr.

Funding and Scholarships: Secure governmental funding and scholarships for incoming PhD students to ensure long-term sustainability.

How will this funding be used?


Total Funding Required

  • Total Salary for FB (4 months): $24,000

  • Travel Expenses: $5,000

  • Tax Expenses: $12,000

  • Compute and PA: $6,000

  • Productivity and Buffer: $3,000

Grand Total: $50,000

Who is on your team and what's your track record on similar projects?

Fazl is a soon-to-graduate Ph.D. student with extensive experience in research, particularly within the field of AI safety and mechanistic interpretability. More than his academic achievements, Fazl's track record in delivering on projects that align with our mission is noteworthy.

  • Apart Research: 

    • Mentored over 10 students, leading to two accepted paper publications at top ML conferences.

    • Invited speaker at multiple conferences related to AI and AI safety and mentored at multiple talent programs in AI safety.

  • AI Safety Hubs:

    • Played an instrumental role in building communities to foster AI safety research in South Africa and Edinburgh.

    • Collaborated with Yoshua Bengio on three articles on safety.

David Krueger's work is well-known within AI safety and includes deep learning, AI alignment, and AI X-risk. 

Philip Torr's work includes deep learning, vision, and safety cited more than 82,000 times.

What are the most likely causes and outcomes if this project fails? (premortem)


The mission of reducing existential risk of AI can fail in three central ways:

Lab establishment issues: A failure during the design and proposal of the lab can lead to the proposal not being accepted. Typical proposals to UKRI and EPSRC has a typical acceptance rate of <20% for grants >£10M. We believe this to be higher than 50% for our team.  

Inadequate research planning: Pursuing a research path that is not effective in reducing existential risk. We mitigate this with our team's focus on safety, reward learning and mechanistic interpretability.

What other funding are you or your project getting?

Currently we are exploring various sources of funding. In addition to other avenues, we are pursuing a substantial grant from the UK government to ensure a solid foundation for the lab.

Comments23
Rachel avatar

Rachel Weinberg

3 months ago

@mapmeld @vincentweisser @esbenkran @JoshuaDavid since Evan withdrew his donation which put this project back below the minimum funding bar, I put this project back in the proposal stage and undid your transactions. Let me know if any of you want to withdraw your offers to. Otherwise they'll only go through if/when this reaches its minimum funding bar ($1k) again.

🍋

Jonas Vollmer

3 months ago

lol at the timing of the events here

Austin avatar

Austin Chen

3 months ago

(for context: Jonas posted his reservations independent of my grant approval, and within the same minute)

🍋

Jonas Vollmer

3 months ago

Edit: comment retracted

🌴

Fazl Barez

3 months ago

Thank you for your input, Jonas. I'm interested to understand the nature of your significant reservations. One of the appealing aspects of Manifund is its decentralized structure, which encourages open dialogue. This helps to counteract the traditional system where funding often depends on personal networks and reciprocal favors.

Would you be willing to share more details privately, given your concerns about public disclosure?

Austin avatar

Austin Chen

3 months ago

In light of Jonas's post and the fact that this grant doesn't seem to be especially urgent, I'm going to officially put a pause on processing this grant for now as we decide how to proceed. I hope to have a resolution to this before the end of next week.

Some thoughts here:

  • We would like to have a good mechanism for surfacing concerns with grants, and want to avoid eg adverse selection or the unilateralist's curse where possible

    • At the same time, we want to make sure our regrantors are empowered to make funding decisions that may seem unpopular or even negative to others, and don't want to overly slow down grant processing time.

  • We also want to balance our commitment to transparency with allowing people to surface concerns in a way that feels safe, and also in a way that doesn't punish the applicant for applying or somebody who has reservations for sharing those.

We'll be musing on these tradeoffs and hopefully have clearer thoughts on these soon.

🍋

Jonas Vollmer

3 months ago

@kiko7 I've sent you an email with feedback. You have my permission to share the email publicly on here if you would like to.

🌴

Fazl Barez

3 months ago

@Jonas-Vollmer I have responded to your email. After you have reviewed the email, we can then evaluate whether to make it public.

🍋

Jonas Vollmer

3 months ago

@kiko7 I don't have the time to engage further via email. Happy for you to decide for yourself!

🍋

Jonas Vollmer

3 months ago

@kiko7 I'm also fine with Manifund deleting all my public comments here in case others think they're too damaging or unfair given that I don't have the time to engage any further.

🌴

Fazl Barez

3 months ago

@Austin I'd like to request that Jonas's comment be temporarily removed, because the substance in the email Jonas sent is not nearly as concerning as the negative impact of the public comment. Once we have resolved the issues raised in the private email, we can consider reposting the comment, either as is or in a modified form.

Austin avatar

Austin Chen

3 months ago

I've updated Jonas's comment above. Evan is also retracting his support for this grant, so we will be unwinding his $50k donation and restoring this project to be in the pending state.

🍄

Mindermann

3 months ago

I'm responding to the concerns Jonas raised with Fazl in a private email shared below, with permission.

Overall, these concerns don't seem substantial enough to warrant 'significant reservations' or even 'permanent repercussions'. I'm writing this because in my view these phrasings from Jonas will give a wrong impression (unless the email I quote below in full omits some important information) and could affect Fazl's reputation and future opportunities unnecessarily. And indeed it has led at least one person to put their funding on ice.

Disclosure: Fazl was my house mate. I've known Jonas for ~10 years.

My concerns are
1. In our conversations, I got the impression that you were primarily trying to impress me and create an impression that we’re buddies, rather than e.g. discussing ideas on the object level. I also thought you tried to create the impression that you knew much more about AI alignment than you actually do. I don't think any of this makes your clearly unsuitable for this grant, but I've learned over the years as a manager/grantmaker that these signs tend to be good predictors of funders overestimating grantees, and their projects not working out as planned.

Regarding Fazl's technical skill, I think the view of a non-expert who has had one or two casual conversations with Fazl (apparently while waiting for food at a takeaway) shouldn't be reason for 'significant reservations' or 'permanent repercussions' as stated in the original. Fazl has endorsements from accomplished ML researchers like David Krueger and a strong publication record that the grant evaluators here can check, plus at least two unpublished accepted NLP publications. (I can attest to his strong technical skill as well). These factors seem substantially more important.

Regarding 'trying to impress', I think for academic grants it shouldn't be a primary evaluation criterion whether the grantee is trying to impress. Many productive academics do this. Additionally, Jonas seems worried that Fazl would impress the funders on this website, but that seems not relevant here: Fazl's whole interaction with them is shown on this website and if there is any undue impressing persuasion happening here it could be pointed out publicly.

2. On top of that, I heard some negative stories (haven’t verified them myself): E.g. that you tried to get into an office space you were repeatedly asked to leave, that a project you ran went poorly and you blamed your collaborators when it was clearly your responsibility, and another significantly negative story.

Jonas has told me which office space this was but not which project or other story. For the office space, I've read the email thread with the office ops and it looks very much like a miscommunication. Fazl was certainly not told to leave in these emails. According to Fazl, that also didn't happen in person. According to the emails, members of the office space invited him inside twice and it seems this was not following the protocol expected by the admin. But from the emails, the protocol also seemed a bit ambiguous and I'm still not sure what exactly was expected.

Given what I know about the office space story, which is the only where I have some insight, and given that Jonas says he hasn't verified any of the stories, I also have some doubts about the other stories now.

Fazl doesn't know what the 'other significantly negative story' is or the project in question. FWIW Fazl works on and supervises a lot of projects at once and it's normal if one goes poorly once in a while (and sometimes it's actually someone else's fault, though Fazl says he doesn't recall blaming anyone for a project failure).

3. I think it’s fine not to have a fleshed-out research agenda, but I’d at least like to see some specific preliminary ideas, which are often good indicators of whether the research will be good.

Grant evaluators can easily see the research proposal, so this should be (and has been) discussed in public and doesn't need to be part of a negative message without clearly stated concerns.

This is my impression based on Jonas' email. There might be omitted info I don't know about.

I'm not planning to engage further, just dropping my 2 cents :)

🍋

Jonas Vollmer

3 months ago

Just one nitpick in response to that: I hope it was clear that I was aiming to avoid 'permanent repercussions' for Fazl, rather than argue for them. I continue to have 'significant reservations'.

🍋

Jonas Vollmer

3 months ago

@Austin Would you mind changing my comment from "comment retracted" to "comment removed with author's consent"? 'Retracted' implies I don't endorse it anymore, but I asked for it to be removed as a favor to Fazl, rather than changing my opinion.

🍋

Jan Brauner

3 months ago

I mostly agree with what Mindermann wrote.

CoIs: Fazl is my housemate; Jonas was (briefly) my house mate in the past.

Austin avatar

Austin Chen

3 months ago

Approving this project! It's nice to see a handful of small donations coming in from the EA public, as well as Evan's endorsement; thanks for all your contributions~

🐌

Joshua David

4 months ago

Establishing an AI safety lab at Oxford seems like a good idea in general, and I expect that research which focuses on mechanistic interpretability is particularly likely to yield concrete, meaningful, and actionable results.

Additionally, Fazl has a track record of competence in organizational management, as shown by his contributions to Apart Lab and his organizational work for the Alignment Jam / Interpretability Hackathon.

Disclaimer: My main interactions with Fazl, and my impressions above, were through Interpretability Hackathon 3 and subsequent discussions, and that is how I heard about this manifund.

Disclaimer: I do not specialize in grant-making in an impact market context - my donation should be interpreted as an endorsement of an AI safety lab existing at Oxford being a net positive, not as an intentional bid to change market prices.

RenanAraujo avatar

Renan Araujo

4 months ago

Interesting project! I'm curious about a couple of things:

  1. What would the research agenda be like, most likely? (Eg what you think would be the most exciting version and the realistic version)

  2. How many people do you expect would work on that agenda and what would their background be on? (Eg would they already have an alignment-related background, just technical folks interested in the field, PhD students or faculty, etc)

🌴

Fazl Barez

4 months ago

Thank you for the thoughtful questions, Renan.

1- The research agenda is still in formation as a key goal over the next 3-4 months is to further shape the directions and priorities and secure funding from the identified sources. However, I envision a significant portion focusing on interpretability, particularly interpreting reward models learned via reinforcement learning. Additional areas will likely include safe verification techniques, aligning with much of Stuart Russell's work as well as the expert areas of Phil and David.

2- Regarding team composition, we expect at least two existing research fellows to be involved and several PhD students to be hired. Most members will have strong technical backgrounds and solid foundational knowledge in AI alignment literature. We aim to assemble a diverse team with complementary strengths to pursue impactful research directions.
Please let me know if you have any other questions! I'm excited by the potential here and value your perspective.

RenanAraujo avatar

Renan Araujo

3 months ago

@kiko7 thanks! Ultimately, I decided to not evaluate this since I don't feel confident about having the right background for that. I incentivize others with a more technical background to evaluate this grant.

🍉

Chris Leong

4 months ago

I would be really excited to see the establishment of an AI safety lab at Oxford as this would help establish the credibility of the field which is one of the core problems holding alignment research back.

That said, I suspect that a proper research direction is crucial when establishing a new lab as its important to lead people down promising paths. I haven’t evaluated their proposed directions in detail, so I would encourage anyone considering donating large amounts of money to do so themselves.

Disclaimer: Fazl and I were discussing collaborating on movement building in the past.

esbenkran avatar

Esben Kran

4 months ago

This seems like a high-EV project and working with FB in Apart Research, I have been impressed by his work ethic and commitment to real impact. One of my worries in the establishment of a new lab is that it could get caught in producing low-impact research, but with him at the helm and the support of Krueger, there is little doubt that this lab will take paths towards concrete efforts to reducing existential risk from AI.

Additionally, the support of the Torr Vision Group provides the credibility and support that other new labs would need to build up over a longer period of time, potentially speeding up the path to impact for the proposed project. I do not specialize in grant-making and provide this donation as a call-to-action for other grant-makers to support the project.