WhiteBox Research: Training Exclusively for Mechanistic Interpretability

Brian Tan

CompleteGrant

$12,420raised

Project Summary

We want to identify and train a highly selected cohort to focus exclusively on mechanistic interpretability (MechInterp) research, with the explicit aim of making substantial progress on Neel Nanda's 200 Concrete Open Problems in Interpretability. We believe MechInterp is an important part of the portfolio of AI safety agendas, and its tight feedback loops make it a uniquely accessible way to approach alignment. This effort will be carried out through a non-profit initiative known as WhiteBox Research.

We are requesting funding for 1.9 FTE (0.8 for Clark, 0.7 for Brian, 0.4 for Kriz) for 9 months. Roughly 70% of our time will be focused on finding & training people in MechInterp, 20% on upskilling ourselves, and 10% on anything else that helps us achieve our long-term goal: to foster a thriving alignment hub in Southeast Asia that can diversify and complement the work being done in London and the Bay Area.

Clark Urzo is a part-time independent alignment researcher and a SERI MATS Winter 2022 virtual participant under John Wentworth, and he will be leading the training program.

What are this project's goals and how will you achieve them?

The main objective of this project is to create a reproducible process of training people to become mechanistic interpretability researchers in particular—as opposed to alignment researchers in general—thereby potentially gaining a comparative advantage in both quality and cost-effectiveness over larger but less focused programs.

Over the 9-month grant period, we plan to operationalize this goal by a series of milestones:

Our training program, largely based on the flipped classroom, mastery learning model and is described in more detail here, aims to have 10-15 participants and is expected to run for a period of 8 weeks. The program will be held in or near a top university in Metro Manila where we’re all based.
The Alignment Jam offers its Interpretability Hackathons roughly every three months. Our next major goal is to have our cohort win at least 3rd place in it, or conditional on the Jam not keeping to its schedule in another equally credible benchmark or competition (possibly organized by us).
Once we can do well in a research hackathon, our next major milestone is to have the best in our cohort produce public-facing work (subject to an internal infohazard policy we’re still working on) that can get an endorsement from other researchers who work in the field. In particular, 3-5 high-quality posts in a joint research blog, similar to Anthropic’s Transformer Circuits Thread project, would be a valuable test of the group’s ability to conduct useful research and/or distillations for the community.
Lastly, we will conduct a series of research sprints to systematically attempt the problems in Neel Nanda’s list of 200 Concrete Open Problems in MechInterp. The problems are not necessarily of utmost importance to interpretability in general, but due to being highly neglected (as seen in this spreadsheet), they can otherwise serve as a useful measure of our group’s ability to produce research that other people can actually build on top of. Concretely, we are aiming for an in-depth write-up of a decisive solution to at least one open problem at first, then aim for a regular cadence of such posts in the months following the program.

By the end of the grant period, we want to produce at least three people (not including us) who can do serious MechInterp research (e.g., become a SERI MATS research fellow in 2024 or reach a certain bar of research quality).

We believe our idiosyncratic approach is worth trying for two main reasons:

Similar initiatives targeting early-career people, such as BlueDot Impact’s AI Safety Fundamentals course and EffiSciences’ ML4Good program, offer a broader curriculum that covers many alignment topics. We think that an entire research group focusing on a narrow area and building deep expertise in it is an underexplored strategy, with Redwood Research’s REMIX program being the only example in interpretability we are aware of.
Metro Manila (and by extension Southeast Asia) is an attractive place to do upskilling and research owing to its drastically lower cost of living (e.g., a fully-furnished 1 bedroom condominium unit in the city costs $250-400/mo to rent), English-speaking population, and lax visa requirements. It can therefore serve as an alternative location for alignment researchers who want to work comfortably for far longer with the same amount of funding, as well as attract alignment researchers who would otherwise be unable to afford moving to London/Berkeley given the current funding climate.

How will this funding be used?

Our preferred amount is a total of $73,460 USD for 9 months of funding at 1.9 FTE. This will fund Clark Urzo for 0.8 FTE, Brian Tan for 0.7 FTE, and Kriz Tahimic for 0.4 FTE. along with our operational expenses.

Our minimum amount to do this project is 6 months of funding for 1.8 FTE at $34,700. However, we're open to accept any amount, and can adjust our plans based on how much funding we get.

Who is on your team and what's your track record on similar projects?

Our team is composed of Brian Tan, Clark Urzo, and Kriz Tahimic.

Callum McDougall, co-founder of ARENA, has agreed to be an adviser of ours.

How the work will be split

Our plan is to have Clark and Kriz split the work on the curriculum design and teaching/mentoring for the training program, while Brian will focus on the less-technical aspects of the program (e.g., marketing, operations, and project management). We will also likely tap the help of someone in EA Philippines or a local EA student chapter to help out with some operations tasks (like event logistics).

Clark (LinkedIn):

I participated in the virtual workshops of the 2022 Winter Cohort of SERI MATS under John Wentworth (though handled primarily by Joe Collman). I was also a facilitator in the 2023 AGI Safety Fundamentals course, and currently a participant in PIBBSS’ Key Phenomena in AI Risk reading group led by Tushant Jha.
Also in 2022, I received a small grant from the FTX Regranting Program via Olivia Jimenez and Akash Wasil to pivot to technical alignment research. Previously, I worked briefly as a machine learning engineer optimizing video compression for a startup in California called Active Theory Inc.
Aside from doing research, I also have extensive entrepreneurial experience. In 2015, I co-founded Veer, one of the first virtual reality companies in the Philippines, producing brand activations for major local companies, such as SM Cyberzone and Jack & Jill across 20+ cities. Our primary product was a virtual crew training program that was certified by the Civil Aviation Authority of the Philippines (CAAP), and I was also a key organizer of XR Philippines (prev. VR Philippines): handling strategy, managing hackathons with several dozen teams and doing targeted promotions that led to us landing multiple interviews in both national news and radio broadcasts.
I also won a grant from Pioneer.app in 2018. Pioneer is a program run by Daniel Gross and funded by Stripe and Marc Andreessen. During this time, I was featured in the online business magazine e27.
Owing to a lifelong interest in rationality, I have spent over 2000+ hours reading rationalist material written by people like Eliezer Yudkowsky, Julia Galef, Gwern Branwen, and so on. I also briefly co-ran a writing group for prolific writers in the r/slatestarcodex subreddit with Alexey Guzey in 2018, and will likely do the Epistea Residency program in Prague this September simultaneously with this project.

Brian (LinkedIn):

I co-founded EA Philippines in 2018 and was on a CEA Community Building Grant to work full-time at EA Philippines in 2021. EA Philippines is now one of the largest EA groups in an LMIC.
At least 12 people in EA Philippines have a strong interest in AI safety/risks, including Clark, Kriz, and Ayrton San Joaquin (former Teaching Assistant at CAIS). I’ve had a minor but helpful role in five people’s AIS journey.
For my AIS knowledge, I’ve consumed most of the Most Important Century (MIC) series and the implications of the MIC series by Holden Karnofsky. I’ve also consumed most resources up to week 3 of BlueDot’s AISF alignment curriculum and am slowly consuming resources in Neel Nanda’s MechInterp guide (starting with MechInterp prerequisites).
I now work at CEA as a group support contractor since Dec. 2021 to support EA groups. (I’m looking to transition from my role to focus on AIS, e.g., via working on this project.) Before working in CEA, I was a UI/UX designer for 1.5 years.

Kriz (LinkedIn):

I'm a 4th-year CompSci student with a full scholarship. I co-organize EA Taft (in DLSU) and was accepted into CEA's Organizer Support Program. My working thesis tries to mitigate superposition via L1 regularization & Adversarial Training and is inspired by Anthropic's Toy Model of Superposition paper. Also, I'm currently receiving coaching from Effective Thesis under Conor Spence.
My journey into EA and AI safety includes finishing the Intro EA Fellowship, In-Depth EA Program, AGISF - EA Cambridge, and Existential Risk Workshop - GCP, as well as attending EAG Washington & EAGxSingapore. Currently, I'm following Neel Nanda's guide, "Concrete Steps to Get Started in Transformer Mechanistic Interpretability." I finished the section "A Barebones Guide to Mechanistic Interpretability Prerequisites" and am now proceeding with Andrej Karpathy’s micrograd tutorial.
I've given a talk on AGI x-risk at an EA Philippines event, I've facilitated AI Alignment, MLSS, and In-depth Reading Groups in EA PH, and have had 1:1’s with people on AI safety. This resulted in 10+ people actively looking to volunteer in AIS field-building, with 3 taking significant steps, including one who plans to pursue a technical AIS-focused PhD.

References

Clark’s:

Chris Leong - Founder of AI Safety Australia and New Zealand
Joe Collman - Technical Lead, SERI MATS
Elmer Cuevas - Executive Director of EA Philippines

Brian’s:

Amarins Veringa - (my manager), Post-Uni Groups Strategy Lead at CEA
Dewi Erwan - Co-Founder of BlueDot Impact
Nastassja Quijano - Co-Founder of EA Philippines

Kriz’s:

Elmerei Cuevas - Executive Director of EA Philippines
Conor Spence - Coach at Effective Thesis
Wrenata Sproat at Global Challenges Project

What are the most likely causes and outcomes if this project fails? (premortem)

We don’t get at least 10 reasonably talented people to join our training program, or that not more than five people complete it.
1. Mitigation: Since the cornerstone of this project’s success is the initial seed of people we choose, we will spend a good proportion of our effort and time into outreach. We will filter for motivation first and ability second (drawing from a pool that is already highly selected, e.g. IMO participants, the top STEM magnet schools in the country).
Our training is not good enough for them to produce high-quality research (e.g., to win an Alignment Jam)
1. Mitigation: We (especially Clark and Kriz) will put considerable time (including outside the FTE of this grant) into upskilling in MechInterp. Clark and Kriz will produce projects to be posted online (particularly on LessWrong) and also attempt to place in a preceding Alignment Jam themselves. We’ll also seek advice from other established MechInterp/AIS researchers. Callum McDougall, co-founder of ARENA, has agreed to be an adviser of ours.
The people we train and/or the MechInterp research we help produce contribute to AI capabilities significantly more than AI safety. We think that this downside risk is small because:
1. We strongly believe that a community centered on open discourse will achieve healthier epistemics in the long run than a community where beliefs are forced top-down. We trust that the kind of person who would do well as an alignment researcher would be persuaded by the sheer strength and veracity of safety arguments eventually, as long as we’re earnest and patient about their concerns.
2. That said, we will not offer support to people who would wish to work in a non-safety-related role in one of the top AI labs in the world where we think most of the downside risk is concentrated, or to those who would want to do explicit capabilities research.
3. We will also enforce an infohazard policy and disqualify repeat offenders from the program.

What other funding are you or your project getting?

We will also apply for funding for this project from Meta Charity Funders and the Long-Term Future Fund, likely by the end of August. We have not received funding for this project so far. If you'd like to get in touch with us privately, you can email Brian at work.briantan@gmail.com.

Brian Tan

about 1 year ago

Final report

This document contains a retrospective on Cohort 1 of our AI Interpretability Fellowship that we held last February-August 2024, as well as some details on our progress so far in Cohort 2 of our fellowship: https://docs.google.com/document/d/1bQaUejzAe7DxyNQWN5aSzA_I49sq2yV2S7N5ZNPPABI/edit?tab=t.0

How we spent this Manifund grant

Staff salaries - $8,836.55 (this funded Clark Urzo, Kriz Tahimic, and me from around Dec. 1, 2023 to early January 2024, as well as Kyle Reynoso for February 2024)

Operations costs until late March 2024 (e.g., our welcoming retreat’s costs, conference room rental and food for our sessions, etc.) - $3,583.45

The rest of the costs for Cohort 1 of our fellowship were funded by two grants from the Long-Term Future Fund.

P.S. Sorry for the delay in closing this project!

Brian Tan

over 2 years ago

Progress update

To add to our last update, here are some additional updates over the last four weeks:

We’ve further clarified our strategy and plans for the next year.
- Our plan for the next year is to run two cohorts of our 5-month, part-time AI Interpretability Fellowship in Manila to produce junior mechanistic interpretability researchers. We’ve created a Theory of Change for the fellowship here.
- Once we’ve completed two rounds, we plan to open our doors to those in Southeast Asia for the third round of our fellowship. The third round will likely be a full-time, 1-2 month version of our fellowship (e.g., starting June 2025).
- Through our fellowships, we aim to kickstart and develop the AI safety community in Manila and Southeast Asia.
We also have these updated goals for our fellowship’s 1st cohort:
- Have our fellows solve or make substantial progress on a handful of concrete, open MechInterp problems (e.g., those in Neel Nanda’s list) that have not been solved yet by the end of September 2024
- Get at least one fellow to be counterfactually accepted by the end of 2024 into a full-time AI safety research fellowship (e.g., MATS’s Winter 2024-25 program)
- Have at least four fellows spending at least 10 hrs/week working on alignment-oriented upskilling and/or interpretability projects by the end of 2024
Unfortunately, our team member Kriz Tahimic left due to health issues. We are grateful for Kriz’s help in co-founding and launching WhiteBox with us. Given his departure, we’ve increased Kyle Reynoso’s responsibilities and extended Kyle’s contract to work with us until August at 0.5 FTE (and past August once we get more funding).
We’re currently fundraising for $92,300 to fund us until March 2025. (Our current funding will only last us until July or August.) The $92,300 would fund:
- Additional operations costs for cohort 1, such as mentor and fellow stipends ($5,100)
- Our 2nd cohort from late September 2024 to March 2025 ($87,200)

If you’re interested in funding or donating to us, you can contact me at brian@whiteboxresearch.org. We can send you our fundraising proposal and information on how to donate.

What are our next steps?

There are three main goals we want to achieve by August:

Conclude our Trials phase (training) with our planned in-house interpretability hackathon and shave off its remaining warts and inefficiencies for the next cohort
Have our fellows complete research excursions on selected problems in Neel Nanda’s list of concrete open problems (COPs), under the guidance of experienced external mentors
Fundraise enough money to fund our 1st and 2nd cohorts, as mentioned above

As shown in our Theory of Change, we will focus on having our fellows work on the COPs so they can upskill in interpretability research rapidly. However, we’re open to other proposals from mentors if there are adjacent problems that our fellows can help them with, so long as they: a) can practice MechInterp fundamentals in those projects, and b) can realistically complete the project by the end of the Proving Ground.

We’re also open to such proposals from our more advanced fellows, following the same constraints as above, and if we and the available mentors deem them viable. This is because promising researchers often have very strong opinions on what they wish to work on, and this can make them more motivated to complete the rest of the fellowship.

Note also that this is not a bet on the COPs being vital to alignment, nor do we expect our fellows to produce immediately useful research by the end of the program: they are and will still be new to the field after all. Rather, we hope the problems will serve as excellent forcing functions for our fellows to get better at the fundamentals of MechInterp as quickly as possible.

How can others help us?

We are still looking for 2 to 4 more research mentors with experience in mechanistic (or general) interpretability research for our Proving Ground (research phase) from June to August. Said mentors just have to meet with 1-2 fellows weekly virtually for ~45 minutes to provide research guidance.
- They can choose to oversee more than one person or duo. As mentioned above, we are also open to having our fellows help their mentors in some MechInterp-adjacent task. For example, mentees can resolve accessible open issues in an existing interpretability project in exchange for the mentor-mentee relationship, as long as they’re properly scoped to fit in our Proving Ground phase. If you are interested in being a mentor for our fellowship, please contact us at team@whiteboxresearch.org.
If you’re interested in or have experience in mechanistic (or general) interpretability, you can join our Discord server here and engage with people in our community, including our fellowship participants.
As mentioned, if you’re interested in funding us, you can contact me at brian@whiteboxresearch.org!

Brian Tan

over 2 years ago

Progress update

We at WhiteBox Research have been progressing well since we got our initial regrant from Renan Araujo, and we’d like to share some updates on our progress below! (We will have a strategy session this week, and we’ll share another update within the next two weeks about our next steps and how others can help us.)

What progress have we made?

Here are our key achievements since September 2023 (up to March 19, 2024):

In November, we were approved $61,460 in funding from the Long-Term Future Fund! Together with our funding from Manifund, this funds us until around August 2024.
We finalized more details of our program and named it the WhiteBox AI Interpretability Fellowship. It’s a five-month training and research program in Manila to master the fundamentals of mechanistic interpretability. We created this primer for the fellowship, and our training phase’s curriculum overview can be found here. [1]
We got 53 applicants and accepted 13 participants into our fellowship, surpassing our goal of getting 50 applicants and 12 participants. [2] [3] Our marketing and application process also helped us start building a wider community of people interested in AI interpretability. [4]
We onboarded Kyle Reynoso as a part-time teaching assistant in February, and he has contributed significantly since then. [5]
We ironed out a process for how participants can view, submit, and receive feedback on their exercise answers more seamlessly via GitHub Classroom and nbgrader.
We’re in the fourth week of our fellowship’s two-month training phase. So far, we’ve received generally positive feedback on the three Saturday sessions and the two-night retreat we held for participants, and we’ve maintained a consistent weekly tempo of adjustments and improvements to various aspects of the program.

Here are some footnotes to expound on our progress above:

[1] Since we opted for less experienced but high-potential participants (their average age is 21), we would probably have to cover more of the prerequisites than other programs (e.g., ARENA), which means we may only delve more into interpretability in the research phase of our program in June.

[2] We opted for a three-stage application process. Stage 1 involved solving Bongard problems and answering essay questions about alignment material (namely Ajeya Cotra’s Saint/Schemer/Sycophant post and Lee Sharkey’s Circumventing Interpretability post). Stage 2 tested their ability to solve coding problems that are tricky to solve even with GPT-4, and Stage 3 consisted of an unstructured interview largely based on the format of the insightful podcast Conversations with Tyler (Tyler Cowen).

[3] Some of the 13 people we accepted include an IOI silver medalist, a 19-year-old who recently got seed funding for his B2B startup, a fluent Lojban speaker who did contract work for an OpenPhil AI safety grantee, and a Master's student who won a gold medal in Taiwan for chemistry research she did in high school.

[4] We spent around a month marketing and community building to attract people to apply for current and/or future cohorts of our fellowship. We ran a successful virtual “Career Planning in the Age of AI” salon at the start of the year with around 27 attendees, and four people whom we ended up accepting joined it. We also started a community Discord server where people from the ambient community can interact with and discuss all sorts of questions with our participants, as a preliminary step towards wider community building in Southeast Asia. (We sent an invite to our server to all applicants, including those we rejected, some of whom already have more background in ML.)

[5] Our TA, Kyle Reynoso, graduated with the highest honors as a CS major in the top university in the country, was an officer in a local student EA chapter, and has a day job as an ML engineer for a marketing compliance AI startup in Australia.

Austin Chen

over 2 years ago

@briantan appreciate the update, especially how in-depth it is; this looks like good progress and I'm excited for the rest of your program!

Brian Tan

over 2 years ago

Thanks @Austin!

Austin Chen

almost 3 years ago

Approving this project! I also especially appreciated that Kriz set up a prediction market on whether they would get to their higher bar of $37k~

Brian Tan

almost 3 years ago

Sorry for the late reply here, but thanks Austin! (Oh and to clarify, Clark set up the prediction market, not Kriz!)

donated $12,000

Renan Araujo

almost 3 years ago

TLDR: I decided to regrant $12k to this project. I’m excited about an organized AI safety training program in an under-exposed, important region (Southeast Asia). I think the core team seems promising and worth the investment, despite their juniority. I think getting experienced mentors will be the main challenge (among others), but I think the team is aware of the relevant failure modes and taking the steps necessary to mitigate them. I’d be excited about others donating at least 23k more to this project to make their MVP possible

Why I’m excited about this project

I’m keen to see new programs happen outside the main hub as a way to widen the surface area of opportunities for talented folks to engage with AI safety. Southeast Asia is one of the regions I’m most excited about due to its large population and geopolitical relevance.
The core team seems organized and promising. They’re quite junior, but seem worth the investment as a way of skilling up by doing. This seems relevant to me especially considering there aren’t other groups trying to fill this gap, as far as I can tell – this project can plausibly allow them to become the experienced folks that would guide others in the future.
To mitigate their juniority, they’re picking a research agenda that has a track record of being useful for getting people interested in AI safety research + develop useful skills for AIS-relevant work. They’re also explicitly inserting their project into a broader pipeline, and establishing sensible metrics of success (e.g., participants winning Alignment Jams, getting into SERI MATS).
I had a call with Brian months ago and another with the whole team today. They gave me some more details about how they’re planning to skill themselves up and mitigate some of the concerns I mention here and Gaurav mentioned above. This made me even more confident about this grant.

Challenges and concerns

I’m concerned about their ability to provide high-quality mentorship to program participants considering their juniority and potential limitations around getting senior mentors involved
- They haven’t heard back from some relevant people yet (e.g., Neel Nanda), and haven’t run similar programs in the past.
- I’d be keen for someone with experience in this kind of program to share their expectations about how this will go.
- However, as I mentioned above, this seems like a positive bet in expectation to me.
Creating talent that will end up doing capabilities research
- Mech interp is quite dual-use and being the first program in the region to skill people up in this might end up hyping capabilities rather than AIS.
- However, I think a) they’re sufficiently aware of this failure mode, and b) AI is sufficiently mainstream I expect this failure mode not to have a big counterfactual downside.
I worry they won’t find sufficiently talented people, and that investing in talent around existing hubs might be more cost-effective.
- I think this applies to all projects aimed at field-building outside hubs. However, I think we (as a community) haven’t invested enough in experimenting in programs like these yet – so the information value by itself seems worth it, in case there are low-hanging fruits available considering a lot of talented people can’t effectively go to the main hubs due to e.g., visa limitations. I’ve been more excited about this due to my experience doing talent search via Condor Camp (Brazil) over the last 1.5 year.
As junior folks without a strong track record, I worry they might not be skilled enough to run an entire org/project by themselves. Maybe they won’t be able to follow through their plans.
- I think they individually have enough experience in their fields that makes me confident about betting on them. Particularly, in my interactions with Brian, he’s seemed quite organized and competent, and I’ve appreciated his work at CEA and setting up EA Philippines. I know less of the other two team members, but at first glance they seem to have complimentary skillsets and experience.

Brian Tan

almost 3 years ago

Hi Renan, we really appreciate your decision to give us a regrant! Thanks also for sharing your thoughts about our project. We're taking your challenges/concerns into account, and we're already coming up with concrete plans to mitigate them.

Gaurav Yadav

almost 3 years ago

*This was written very quickly, and I may not agree with what I'm saying later on!

Here are some questions and thoughts - I can't commit to funding at the moment, but I would like to share my thoughts.

Having spent roughly 1-1.5 years community building and observing Brian quite active on the EA Groups Slack and through email communications, I'm left with the impression that Brian is quite agentic. I hold a high prior on the plans of this proposal being executed if funded. Thus, I can see plans being made and things being carried out.

I also hold some confidence that establishing another hub might be beneficial, although I'm not entirely sure how to reconcile this with the idea that those interested in working on alignment might derive more value from visiting Berkeley than going to a new hub.

A few concerns do arise, however. The proposal mentions research sprints to solve the COPs, and while this approach seems suitable for less time-intensive tasks, I question its overall efficacy (45% sure this is true). I believe that rushing things or working on them quickly might not be the most conducive to learning.

Regarding the statement 'Due to being highly neglected,' I'm under the impression (60% sure) that interpretability is slightly saturated at the moment, contrary to the assertion that it's heavily neglected.

My final concern is about mentorship. It appears that only one person on the team has formal mentorship or experience in MI. This is concerning, particularly if you're planning on onboarding 10-15 people, as having one person mentoring them all is going to be challenging. More mentorship (and more experienced mentorship) might be necessary to identify and correct problems early and prevent suboptimal strategies from being implemented.

Brian Tan

almost 3 years ago

Hi Gaurav, thanks for weighing in on our project! Here are our thoughts on what you said, written mainly by Clark:

We agree there’s value in visiting Berkeley if people had the means, but we think it’s important there be more alignment hubs in various regions. We think that a good number of potential AIS researchers in Southeast Asia would find it costly and/or hard to visit or move to Berkeley (especially in the current funding landscape), as compared to visiting or working in Manila / SE Asia.
On research sprints to solve COPs: there are nuances to speed. Optimising for paper writing speed for example doesn't make sense, nor would treating the problems as Leetcode puzzles you can grind. The kind of speed we're optimizing for is closer to rate of exploration: how can we reduce our key uncertainties in a topic as quickly as possible? Can we discover all the mistakes and dead-ends ASAP to crystallize the topic's boundaries rapidly? Can we factor the open question into two dozen subquestions, each clearly doable in one sitting, and if so, how many of them can we do in a given timeframe? The crucial point is this: moving around produces information. We want to ruminate on questions in the middle of coding them up, develop the habit of thinking through problems in the space of a Jupyter notebook, and shrink this loop until it becomes second-nature. We have also emailed Neel Nanda and Joseph Bloom about our project and aim to get their advice, so we won't veer too far off course while still learning to walk on our own.
On mentorship, we expect to do well enough in the training phase, but we likely need more mentorship in the research phase. That's why we're going to get a research adviser. During the research phase, the students will (mostly) get advice from Clark and Kriz, while we take advice from a research adviser. The goal is eventually to train ourselves and/or get enough people on our team so that we can confidently do the advising ourselves. This is also why we're adopting the flipped classroom model: we'll only have to produce/curate the learning materials once, and then just focus on getting them to do exercises. We're quite confident this is doable as Clark has taught classes of more than 40 people before.

Let us know if you have more thoughts or questions!

Gaurav Yadav

almost 3 years ago

@briantan Hi Brian, don't have more thoughts or questions at the moment, but thanks for the thoughtful reply, these seem good!

Brian Tan

almost 3 years ago

@GauravYadav No worries, thanks Gaurav!

donated $12,000

Renan Araujo

almost 3 years ago

@briantan @GauravYadav just wanted to say this discussion was useful for me, thanks for bringing up those points and for the responses!

Brian Tan

almost 3 years ago

@RenanAraujo We're glad to hear!