Build an AI Safety Lab at Oxford University

Project summary

I have been offered to help set up a safety and interpretability lab within the Torr Vision Group (TVG) at Oxford. This grant provides 4 months of funding to set up the lab, define strategy, coordinate with collaborators, and define the research goals.

With this, we develop Oxford University's first AI safety research lab within the Torr Vision Group (TVG), under the guidance of Prof. Philip Torr and Dr. David Kruger. With a new lab, we aim to set new industry standards within AI safety and interpretability.

What are this project's goals and how will you achieve them?

The project’s overall aim is to establish an AI safety research lab within the Torr Vision Group. For this project to be successful over the long run, we have three primary aims:

Establishment: Build a lab dedicated to AI safety and interpretability at the University of Oxford's Department of Engineering Sciences.
Research Agenda: Devising a robust research agenda, leveraging the expertise of lab associates David Kruger and Phil Torr.

Funding and Scholarships: Secure governmental funding and scholarships for incoming PhD students to ensure long-term sustainability.

How will this funding be used?

Total Funding Required

Total Salary for FB (4 months): $24,000
Travel Expenses: $5,000
Tax Expenses: $12,000
Compute and PA: $6,000
Productivity and Buffer: $3,000

Grand Total: $50,000

Who is on your team and what's your track record on similar projects?

Fazl is a soon-to-graduate Ph.D. student with extensive experience in research, particularly within the field of AI safety and mechanistic interpretability. More than his academic achievements, Fazl's track record in delivering on projects that align with our mission is noteworthy.

Apart Research:
- Mentored over 10 students, leading to two accepted paper publications at top ML conferences.
- Invited speaker at multiple conferences related to AI and AI safety and mentored at multiple talent programs in AI safety.
AI Safety Hubs:
- Played an instrumental role in building communities to foster AI safety research in South Africa and Edinburgh.
- Collaborated with Yoshua Bengio on three articles on safety.

David Krueger's work is well-known within AI safety and includes deep learning, AI alignment, and AI X-risk.

Philip Torr's work includes deep learning, vision, and safety cited more than 82,000 times.

What are the most likely causes and outcomes if this project fails? (premortem)

The mission of reducing existential risk of AI can fail in three central ways:

Lab establishment issues: A failure during the design and proposal of the lab can lead to the proposal not being accepted. Typical proposals to UKRI and EPSRC has a typical acceptance rate of <20% for grants >£10M. We believe this to be higher than 50% for our team.

Inadequate research planning: Pursuing a research path that is not effective in reducing existential risk. We mitigate this with our team's focus on safety, reward learning and mechanistic interpretability.

What other funding are you or your project getting?

Currently we are exploring various sources of funding. In addition to other avenues, we are pursuing a substantial grant from the UK government to ensure a solid foundation for the lab.