Why I generally don't recommend internal prediction markets or forecasting tournaments to organisations
Given the success of the Good Judgment Project (where I spent some happy years), the book Superforecasting, the US Intelligence Community Prediction Market and the plethora of other projects exploding out of IARPA’s early investment in forecasting research, I am often asked why more firms and organisations haven’t set up their own internal forecasting projects to harness the benefits of these systems to generate useful information about the future, and why the few who did take the plunge have mostly abandoned their efforts.
A recent paper suggests that the predominant reasons big intelligence orgs haven’t adopted these systems are:
(i) bureaucratic politics
(ii) decision-makers lacking interest in probability estimates
(iii) lack of knowledge about these platforms’ ability to generate accurate predictions
But I fear much of the emphasis on these factors is self-serving on the part of the forecasting community. Of course bureaucratic politics, stick-in-the-mud leadership and ignorance are factors limiting the adoption of any new tech, but I often hear from leaders who are enthused about the possibilities of prediction markets or forecasting tournaments for their firm who are very keen to set up these kinds of projects to generate useful insights based on those ideas- and I still nearly always advise them not to do this. Why?
1. Internal forecasting tournaments and prediction markets help you be more accurate but often you’re only more accurate about something you care less about, in which case the tradeoff may not be worth it.
By setting up a forecasting project with the goal of maintaining an engaged user base AND generating useful data, you’re risking being caught between two goals and often end up doing neither very well. Forecasting tournaments and prediction markets are games which need players, and the players don’t necessarily care about whether the market price or tournament ranking reflects useful information- they just want a fair and fun field to compete in. Ideal competitive forecast questions are simple and discrete with ironclad resolution criteria. eg “Will North Korea test-launch an ICBM which flies for more than 900 miles before the 31st October 2024?”. What you care about is “How worried should we be about North Korea?” or even, “what should we do about North Korea?” These questions overlap a bit, but not that much, and to keep your players around you need to make your topics look more like the former than the latter, trading away a lot of applicability as you do.
2 By the time you’re asking the question, it might be too late. Forecasting tournaments and prediction markets are a great way to get answers to specific questions- but knowing which questions to ask is very difficult. Not just because they have to be written in a robust and unintuitive way (see point 1) but because you need to have some knowledge that there’s a there there to even consider a question to start with, and by building a system which does nothing but respond to narrow specific questions you are eliminating one of the things you most want forecasters to do: warn you about a problem or risk you hadn’t thought of!
3. Most of the useful information you produce is about the people, not the world outside. Forecasting tournaments and markets are very good at telling you all kinds of things about your participants: are they well calibrated, do they understand the world, do they understand things better working alone or in a team, do they update their beliefs in a sensible measured way or swing about all over the place? If you want to get a rough epistemic ranking out of a group of people then a forecasting tournament or market is ideal. A project like GJP (which I am very proud of) was, contrary to what people often say, not an exercise primarily focused on producing data about the future. It was a study of people! The key discovery of the project wasn’t some vision of the future that nobody else saw, it was discovering the existence of consistently accurate forecasters (“superforecasters”) and techniques to help improve accuracy. The book Superforecasting was all about the forecasters themselves, not the future we spent years of effort predicting as part of the study, which I haven’t heard anyone reference other than as anecdotes about the forecasters.
4. They’re very expensive to run. For something recommended by economists, it’s surprising that the costs of running a project like this don’t get much of a mention. You need to have a platform to record forecasts which fits with people’s preferred mode of contribution, a project lead to keep people interested and enthused, a prize fund or some market making mechanism, plus an enormous amount of time dedicated to coming up with and proofreading forecasting questions. And when you do eventually come up with your popular, unambiguous, discrete, dated and resolvable questions, they’re often not very useful. This is normally a six figure investment, minimum, and again, what are you actually able to produce for your money versus more conventional means?
5. They’re often unpopular. Most people don’t like forecasting, and the people who sign up for forecasting sites like GJOpen and Metaculus are unusual. Most people I talk to about this are also unaware that there is enormous overlap between the users of all of these prediction sites, so the pool is even smaller than it looks from the outside. Your organisation probably doesn’t employ many people like this.
What people really like is randomness: The size of betting markets is roughly inversely proportional to the amount of chance involved, with the most popular betting product being the almost entirely random lottery, which doesn’t reward extra effort at all and means my five minute impulse purchase has as much chance of winning as your 40 hours of analysis. Most firms who run some type of internal prediction system struggle to maintain engagement and keep people happy. This is fine if you employ 100,000 people because you only need the 100 or so most interested to take part, but for a smaller org this is going to be a challenge.
Are forecasting tournaments and prediction markets ever useful?
Yes, under a few circumstances:
1. Where people are strongly incentivised to lie or are honestly wrong (often due to heightened emotions), having penalties for being wrong will eliminate a lot of knowing bullshitters ex ante and the deluded and honestly wrong over time. Forecasting tournaments and prediction markets get so much airtime in public discourse because people involved in that love to lie and mislead, and emotions run high with bad epistemic effects.
Consider events like recent disease outbreaks which can be characterised by public worry, institutions engaged in CYA, politics shaping the range of acceptable things to say, and almost nobody (aside from a few bloggers) optimising for clearly communicating risk to the public. Having some type of mechanism to incentivise people to give their real views can be very valuable, even if that means some mistargeting of questions.
2. As a recruitment tool or to identify talent to build a pool of skilled forecasters who can be turned to in moments of uncertainty. During the recent pandemic, my superforecaster friends figured out lots of extremely useful things very early and we weathered the pandemic well as a result: For example, when UK politicians were briefing against the health minister, saying he was deluded for believing that a vaccine was imminent, my Superforecaster pal Jonny Kitson was letting everyone know that in fact it was inbound shortly and we should plan accordingly. On masks, airborne transmission, quarantines, lockdown effects and the rest, forecasters were just worth listening to, well before institutional communications were able to catch up to their insights. However, the majority of these useful views didn’t emerge from their official forecasting platform contributions: they were posted on twitter, in private briefing documents, groupchats et cetera.
I think there’s a parallel with things like the original Olympic Games or medieval jousting competitions here, where tourneys are designed to identify people who are good at fighting, but aren’t a model of how you should actually do the fighting.
If you could wrestle and beat the best warriors in Athens in the nude then that is a strong indication you’d be of value on the battlefield, but if you reacted to the Spartans coming over the hill by putting down your sword, stripping off your clothes armour and anointing yourself with oil in order to wrestle with them then you’d not be doing the optimal thing.
Talking my own book:
I think there may be a way to combine the precision and accountability of forecasting tournaments with the usefulness of qualitative private briefings by well-calibrated forecasters: this is what we are exploring at the Swift Centre. We are focusing on helping forecasters to share more of their thinking around a central topic via multiple subquestions and written comments- getting as close to ‘well calibrated forecasters explain how they see it’ as possible, by picking one big topic, like whether Russia will use a nuclear bomb in Ukraine, or figuring out the likely consequences if they did and then leaving the rest to our well-calibrated panel; figuring out subquestions, writing comments, and producing data for an expansive briefing with an emphasis on the conditional questions which mirror the kinds of things decision makers will need to think about.
If you’re interested in what we are working on, have suggestions for us, or are curious to read our work then sign up to our newsletter at swiftcentre.org or here on SubStack.
Thanks to Bryne Hobart in The Diff for talking about this and prompting me to write this post:
A good reality check. My perspective is that we can use these observations to decide how the best organisations could, if they choose, do better forecasting / prediction markets than that which has come before. https://twitter.com/thatMikeBishop/status/1600568351151083520?s=20&t=Ymthl5tlKxzVAn_AAaYmmA
Thanks for writing this! I was planning to start a forecasting tournament in my org at work with around 100 people. But from talking to other people, I couldn’t identify if we could get any signal before running but.
I think you’re spot on that it’s just a minority of people who enjoy logging int Metaculus et al. That’s a good reminder why trying to get buy in from a majority about participating in a forecasting tournament is a hard sell.