Hivemind prediction markets case study: The Winton Climate Prediction Market

Mark Roulston
Hivemind
Published in
8 min readOct 4, 2018

At the beginning of this year students and academics at British universities were invited by Winton to participate in a prediction market for seasonal climate forecasting. The prediction market was denominated in credits rather than cash but Winton donated £55,000 for prizes for the institutions of the ten teams that finished the competition with the most credits. Forty six teams signed-up from sixteen universities and twenty four of these teams actively participated in the market. This market was a proving ground for the design of a prediction market platform which, earlier this year, Winton spun out with other research software tools in the technology company Hivemind.

Design of the Outcome Space

The competition challenged teams to predict the monthly average maximum temperature for the UK and the monthly UK rainfall as determined and published by the Met Office. There were six separate markets, one for each month from April to September. Each market had a two-dimensional space of outcomes in which each outcome covered a range of 0.2°C in temperature and 5mm in rainfall. The space covered the range of 0 to 25°C in temperature and 0 to 200mm in rainfall giving a total of 5,207 distinct outcomes. Teams could define contracts consisting of sets of outcomes and then trade these contracts. Teams received 1.00 credit for each contract they held at settlement that included the outcome that actually occurred. The price at which each outcome traded could range between 0.00 and 1.00 and if the price reflected the fair value of the outcome it could be interpreted as the implied probability of that outcome occurring. Thus the market generated a two-dimensional probability distribution for monthly temperature and rainfall in the UK. This feature of Hivemind’s prediction markets contrasts with other prediction market designs which often only offer markets on simple binary outcomes.

Solving the Liquidity Problem

Many prediction markets use a continuous double auction mechanism which matches buyers and sellers directly. With over 5,000 outcomes per market this approach would not work very well: Teams could define their own contracts and typically no two teams would ever be trading the same contract. This liquidity problem, which has thwarted previous attempts to create climate prediction markets, was solved by using a market maker that was always prepared to buy and sell contracts. The market making algorithm used was based on the Logarithmic Market Scoring Rule developed by Robin Hanson. This algorithm allows the market to extract information from very small numbers of participants. The market maker can expect to lose money (or in this case credits) but this is essentially a reward to the participants for providing good information. This is the opposite model to traditional market makers who want to make a profit from market participants.

The market was initialised in early March with an auction. During a one week bidding period teams defined contracts and stated how many units of the contract they would be prepared to buy and at what price. Once the bidding was closed an algorithm allocated contracts based on the value of the bids, irrespective of the order in which they were placed. This auction mechanism meant that Winton didn’t need to set the initial prices itself but also avoided the problem of teams competing to snap-up obviously underpriced contracts when trading began.

Market dynamics

After the auction, real-time trading began in which the market maker offered instant quotes which teams could choose to accept or reject. Initially trading activity was concentrated in the market for April. Ultimately over 2,000 separate trades were made in this market. Around 2,500 were made in the May market and over 1,000 trades made in each of the markets from June to August. The September market saw high levels of programmatic trading with over 10,000 trades made. Typically more than half of all trades and almost two-thirds of all volume (trades weighted by value) occurred within the month being traded. This makes sense as the information flow accelerates once the month begins as observations and medium-range forecasts become available.

The weekly volume (credit value of trades) traded on the prediction market for all open months.
The average percentage of total volume in each market traded on each day as a function of the forecast horizon (the number of days until the end of the month being traded). On average, more than 60% of all volume was traded within the month corresponding to the market.

There was a large variation in the activity of the participating teams. Over 95% of all trades were made by just 5 teams. A lot of these trades were made using the API which teams could use as an alternative to the user interface. The API allowed programmatic trading and also enabled more flexible definitions of contracts in which individual outcomes could be given arbitrary weights. The difference in trading styles between teams was akin to that found in financial markets in which some investors identify undervalued assets and buy and hold for long periods while others frequently change their positions hoping to exploit smaller pricing anomalies. Both approaches can yield good returns and both can contribute to information discovery.

The number of trades made by each team during the competition. Five teams were responsible for 95% of all trades.

The Summer of 2018 turned out to be a very interesting one for UK weather. May was the warmest on record (going back to 1910) while June and July were each the second warmest.

The evolution of the probability implied by the market that May 2018 would be the warmest May on record, beating the previous record set in 1992. This graph illustrates the type of probabilistic information that can be extracted from the market.
The evolution of the probability distribution implied by the market for June rainfall. June 2018 turned out to be the 9th driest June on record for the UK.
The evolution of the probability distribution implied by the market for July temperature. July 2018 ended up being the second warmest on record. The market began by assuming that July would not be particularly warm but increased its prediction when May and June turned out to be exceptionally warm. Once July began, the prediction increased again as observations and medium-range forecasts confirmed that the month would be one of the warmest on record.
The market prices for the July market on July 22. The prices can be interpreted as a joint probability distribution for temperature and rainfall. The market was predicting that July would turn out to be one of the warmest and driest on record. In fact, while it was the second warmest (22.6°C) it was only the 19th driest (55.3mm). Thundery rain in the last week of July prevented it from being as dry overall as the earlier part of the month.

The April and May markets were held open until the 15th day of May and June respectively when they were closed and settled using the numbers published by the Met Office. This meant, however, that there was a two week period in which the number had been published but the market could still be traded. Not surprisingly, during this time the price of the correct outcome rallied very close to 1.00 while the prices of other outcomes collapsed. Teams that were quick to move after publication could pick up near risk-free profits. In a real-money market this wouldn’t necessarily be a problem as these profits are ultimately coming from the market maker. However, due to the tournament structure of the academic competition, in which prizes were awarded for relative performance, these profits could potentially influence the gains of other participants. From June onwards markets were closed on the last day of the month, before the release of the official temperature and rainfall averages.

The evolution of the price of the outcome that ultimately occurred in each market. The April and May markets were held open until the 15th day of the following month so the price of the correct outcome went very close to 1.00 after the Met Office released the actual numbers. Subsequent markets were closed on the last day of the month, before the official release.

How good were the forecasts?

Although the main aim of the competition was to test the design of the market, the question of how good the forecasts were is of obviously of interest. Making a direct comparison with publicly available seasonal forecasts is tricky because these forecasts are typically not presented in terms of the national monthly averages that the competition targeted. One of the features of a prediction market is that the decision maker creating the market can specify the event space in terms of the outcomes of interest and then market participants have the task of translating available information into that space. In this case, the academic teams had to figure out what seasonal and medium-range weather forecasts implied for average monthly temperatures and rainfall in the U.K.

The mean absolute error of the expected temperature implied by the market as a function of forecast horizon. The horizontal lines show the error if the forecast is the 1910–2017 average.
The mean absolute error of the expected rainfall implied by the market as a function of forecast horizon. The horizontal lines show the error if the forecast is the 1910–2017 average.

The implied expected temperature and rainfall converged on the correct answer during the course of trading. Most of this convergence occurred during the month of the market in question. As mentioned above, this is when there was a big increase in the flow information in the form of observations and forecasts.

The mean logarithmic score of the probability forecasts implied by the market. The logarithmic score is the negative logarithm of the probability assigned to the correct outcome. Smaller values are better, with the minimum possible score of zero being attained if a probability of 1.00 is assigned to the correct outcome. The horizontal lines show the logarithmic score of a probability forecast based on the 1910–2017 climatology.

The prediction market didn’t just generate a point forecast for temperature and rainfall but a full joint probability distribution. Probability forecasts can be evaluated using scoring rules such as the logarithmic scoring rule. The logarithmic score of a probability forecast is the negative logarithm of the probability assigned to whatever outcome ultimately occurred. This score actually measures the information deficit of someone in possession of the forecast, so smaller values are better. The best possible value of the logarithmic score is zero and this is attained if the forecast assigns a probability of 1.00 to the correct outcome.

Long-range climate prediction

The academic competition demonstrated that motivated participants with domain knowledge can generate probability distributions of continuous variables, including information about the relationship between variables. The purpose of the academic competition was to test the design of a prediction market that could be used for longer range climate prediction. The two-dimensional outcome space of monthly UK rainfall and temperature could be replaced by annual atmospheric carbon dioxide concentration and global temperature anomaly.

A long-range climate prediction market would ideally be open to a wider pool of traders. The participation of climate scientists would be vital but others should not be excluded. This would obviate the need to define who is and isn’t an expert and it would also mean that anyone disagreeing with the consensus implied by the market would be free to bet against it.

Beyond climate

The experience of the competition suggests that prediction markets of a similar design could have applications beyond climate. The use of a subsidized market maker means that this type of market can extract information from a relatively small pool of participants — assuming of course that collectively these participants possess information. Private prediction markets can be used by companies and other organisations to aggregate the knowledge of small groups of selected experts. These private markets can be structured so that they are not classed as gambling but are effectively a form of consulting with performance-based fees. It is these kinds of applications for which Hivemind’s prediction market platform, Agora, is designed.

--

--