Interpretable Forecasting with Transformers

02/24/2023date created
Sign in to contribute

Project description

I am working with Nuño Sempere on a project to extract latent probabilities from GPT-3.

Primary outcomes:

  • improve on the state of the art in anti-hallucination and truthful question answering using LLMs.

  • measure information retrieval + architecture tweaks vs crowd performance on prediction markets.

  • elicit explanations for the reasoning behind the model's decisions, both directly and indirectly

What is your track record on similar projects?

  • Our team is currently #7/#61 on the Autocast Competition (forecasting.mlsafety.org). We're prioritizing understandable, legible, and safe behavior above optimizing for capabilities.

  • Nuño is an expert on forecasting at the Quantified Research Uncertainty Institute. He is the author of forecasting.substack.com, was a summer fellow at FHI, created the "Estimated Value" sequence, made metaforecast.org, and is a founding member of samotsvety.org.

  • I've been at Microsoft for ~3 years, have a bit of experience with LLMs, did 5 internships, won multiple awards in international competitions (including a $35k prize in the HITB AI Challenge), was invited to speak at an IEEE conference, got into Stanford, and met Geoff Hinton once.

How will you spend your funding?

  • Paying rent for experimentation and testing on cloud GPUs. Only so much you can do with APIs.

  • We'll apply for a second round of funding to scale up our approach if initial results are promising.

holds 73%
sheikheddy avatar

Sheikh Abdur Raheem Ali

about 2 months ago

@ScottAlexander Is it possible to hold on to my current shares? Not interested in selling at the moment.

Austin avatar

Austin Chen

about 2 months ago

I'm fairly sure that Scott would be happy to allow you to hold on to your current shares, the caveat that if you don't accept this current offer, he may not make any other assessment or offer in the future.

holds 73%
sheikheddy avatar

Sheikh Abdur Raheem Ali

about 2 months ago

That's fine.

holds 73%
sheikheddy avatar

How much money have you spent so far?

  • It’s hard to calculate this but I’d claim it’s about USD 10k. More if you include opportunity costs. I can provide a breakdown of this budget upon request.

Have you gotten more funding from other sources?

  • Yes. Janus has provided OpenAI API credits and has reimbursed some of my other expenses. Nuño has been consulting. For the rest, I’ve drawn from savings by selling RSUs. 

How is the project going?

  • Got accepted to SPAR under Rubi Hudson, so this project is merging with Avoiding Incentives for Performative Prediction in AI | Manifund

  • Plan to continue working on this agenda from Jan to Apr 2024, sent an application to AI Safety Camp

  • Ran some basic experiments but bottlenecked on conceptual progress. Some false starts, no publishable artifacts so far, but working on it. Please get in touch directly if you'd like to hear more.

How well has your project gone compared to where you expected it to be? (Score from 1-10, 10 = Better than expected)

  • 3.3

Are there any remaining ways you need help, besides more funding?

  • A magic wand that reduces bureaucratic inefficiency.

Any other thoughts or feedback?

  •  Not for now!

MarcusAbramovitch avatar

Marcus Abramovitch

5 months ago

Let me know how this is going. Can maybe fund this.

holds 73%
sheikheddy avatar

Briefly: Got access to the base model of GPT-4, trying to explore why it’s better calibrated than the instruction fine-tuned RLHF version. Also in DMs with the CEO of Lambda Labs to discuss renting H100s. I’ll fly out to Berkeley from July 10th to Sep 7 if I get a U.S visa. Collaborating with the Cyborgism stream. I’m also transferring teams to work on Bing Chat and am trying to get researcher access to GPT-4’s vision module.

Primary expense at this stage is the cost of our time. More investment would be a signal that this work is valuable, which would make it easier to prioritize over alternative projects.

Further progress is not blocked on funding, but would accelerate it, although I can’t claim to know what the precise relationship is there.

We would likely spend the money to free up more focus time.

holds 73%
sheikheddy avatar


The Autocast Competition (mlsafety.org) was closed due to the FTX collapse, so we decided to scrap the paper and reorient towards eventually selling the project to Anthropic instead.

• No outputs on the development side in the last two weeks because I needed a break after pushing to wrap up work prior to my vacation and continuous exhaustion isn't sustainable.

• Applied to SERI MATS to get more time to work on this, got an informal accept from the mentor we targeted, but waiting for official decisions to be out.

holds 73%
sheikheddy avatar

@Austin thanks! Quick answers:

Deliverables: We'll open source our methods, code, models, data, animations, and any additional information needed to reproduce the experimental results. We aim to submit a paper to NeurIPS 2023 within the next 8-9 weeks. Public release date is currently 14 weeks from now.

Commitment: I am taking 4 weeks off (starting late April) to focus primarily on this project. As far as when to scale: it's hard to give a firm date since the field moves so fast, but this is really a function of how much we raise. Some parts of our architecture are scale invariant, others plug into publicly available LLMs, and some components of the system are traditional software. On the margin, dollars spent on inference and evaluation (for e.g ablation studies/prompt testing) are more useful than dollars spent on training, at least until you get pretty far down the list of ideas. We'll make the decision to scale when we think it's a good idea, and we don't yet know precisely when that will be.

Austin avatar

Austin Chen

9 months ago

Hi Sheikh! This seems like a neat project - it's awesome to hear that Nuno is involved here too. A couple questions that might help investors evaluating this:

  • What are the deliverables if experimentation goes well -- eg published paper? Blog post? Interactive website?

  • Roughly how much time do you and Nuno expect to put into this before deciding whether to scale up?


Aaron Lehmann

9 months ago

I'm curious to learn more about the second primary outcome, "measure information retrieval + architecture tweaks vs crowd performance on prediction markets". This sounds like the main tie-in to forecasting. Is the idea to predict the probability of an event using GPT-3 (either by asking directly or extracting probabilities in a lower-level way) and compare the accuracy of these predictions to prediction markets?