What progress have you made since your last update?
Our biggest updates to Calibration City this month have been the addition of a few new tabs!
🤔 First up we have the introduction, a sorely-needed explainer for this whole “calibration” thing. This is mainly formatted as a dialog-based blog post where we explain the basics of quantified predictions, calibration, accuracy, prediction markets, and the site’s purpose. I hope it is able to bring the ideas behind prediction markets to a few new eyes, directly alongside the statistics to (potentially) back it up.
🥇 Up next we have the accuracy tab, with a set of graphs I teased last month. Ever wondered if longer markets are more accurate? What about markets with more traders? Now you can compare all markets across every supported platform against any attribute we measure - plus you can combine it with any of the other standard filters! Do sports markets on a specific platform get more accurate the closer to the end of the market? Are markets that resolve yes more accurate than those that resolve no? Experiment to your heart’s content!
🔍 Wait, what markets are in that bin of sports markets on that one platform that resolved no? Well to find out you can go search on that site… or you can use the new list page! List, sort, filter, and browse the markets to your heart’s content. (The API endpoint that powers this page can also be used to download all of the markets in my database, in case you’re interested in double-checking my math or rolling your own 😉)
👨🎓 Interested in learning more about the site? Head on over to the FAQ tab! It’ll answer your question… as long as your question is “where can I learn more about scoring prediction markets” or “give me more nitty-gritty implementation details”. I’ve been trying to include responses to actual frequently-asked questions on the relevant page with better wording or hover-text, so nothing has actually made it to this page yet 😅 Feel free to give suggestions for what I should add, though!
What are your next steps?
🏆 More scoring options! We can already calculate and show the Brier score for every market, but there are a lot more scoring methods! I plan to add logarithmic scores, spherical scores, and more.
🖇️ Even more platforms! After adding Polymarket (and then working with their dev team to get even more information) I paused adding new platforms to the site in order to get the user experience the way I wanted. Now, I think we’re in a great state and we can get moving on integrating even more data!
✅ Corpus of questions! The biggest issue with the site as it stands now is that you’re comparing apples to oranges - not all markets are comparably difficult! I don’t want to punish sites for catering their questions to their communities, and I don’t want to reward sites that might attempt to “game” statistics like calibration or accuracy. I want to build off of existing datasets, leverage tools to replicate questions across platforms, and build a large enough corpus of questions that users can confidently see how accurate different platforms were for key questions or in aggregate.
Is there anything others could help you with?
Absolutely: I’m not a technical writer, so please review what I’ve written! The introduction dialog is a quick summary of how I understand calibration and accuracy, but it could be wrong or misleading in important ways. That’s why I’ve set up this bounty for users to submit their feedback on the site, especially anything that was confusing, unintuitive, or incorrect.
I’m also interested if there is any prior art to grouping a corpus of questions like we will need to compare accuracy across platforms. Aside from doing it manually, we could leverage the markets that MirrorBot has replicated or use LLMs to determine when a market is “close enough” to be identical, but each of those have drawbacks. Since this is going to be a big feature with a high future maintenance burden, I would be happy to hear any ideas that may make it simpler!
As before, I’m always looking for ways to make the site better. If you have any ideas for things that would bring you to the site more often or features you would love to see, I’m all ears.