Numerai’s New Tournament to Crowdsource the Future of the Stock Market

The traditional crowdsourced machine learning tournament depends on a holdout dataset. The holdout data is some historical data known to the tournament organizer and unknown to the data scientists participating in the tournament. Data scientists’ submissions are graded and paid based on their ability to predict this holdout dataset. This creates an incentive to predict the holdout set as closely as possible, but there is no incentive to build models that generalize to the future. Data scientists are being rewarded to predict the past. This incentivizes overfitting, the primary enemy of data-driven endeavors.

In data science, logloss is a standard metric for measuring how good a set of predictions are. Data science competitions use logloss to rank competitors. On each submission, a data scientist is given a public logloss to indicate how good the predictions performed on the public holdout dataset. The major problem with this approach is that getting consistent feedback from the competition enables competitors to tailor their predictions to the feedback itself rather than solving the actual problem. This enables the overfitting that is incentivized by holdout dataset-based rewards.

There are many attempts to mitigate the overfitting that data scientists are incentivized to achieve in this tournament format. Most approaches involve complicating the selection of holdout sets and diminishing the usefulness of the logloss reported to the data scientists. Rather than bring together thousands of data scientists to achieve a good logloss on the past, Numerai’s only interest is to predict the future.

Incentivizing Generalization

To perfectly align incentives with data scientists, Numerai no longer has a holdout dataset or a leaderboard, either public or private. Rather than hide information from the data scientists, Numerai gives data scientists all known information. Instead of grading data scientists on a fixed set of past data, data scientists are graded on future data once it becomes known. Four weeks after a tournament begins, the actual outcome of what was being predicted is known. Data scientists are then ranked and paid both USD and Numeraire based solely on their ability to predict those four weeks. This makes the overfitting problem the direct adversary of the data scientists.

A paid round of the tournament, four weeks after the tournament began. The “Live Logloss” represents how well the model predicted those four weeks.

The above graph shows the logloss performance of an ensemble of the top user predictions in a round of the Numerai tournament. The orange line is the logloss expected of random predictions. Anything below it is a good prediction.

Here, all the data scientists’ predictions on the future for a round of the Numerai competition are compared against a backtest (test logloss). Their backtest logloss and their actual future outcome logloss (live logloss) are very similar, indicating the backtest was a good indication of out-of-sample, future performance.

Rather than devising increasingly complex methods of concealing information to combat overfitting, we’ve crowdsourced the overfitting problem itself. The above graphs show not only that data scientists successfully predicted the future, but that their future success was predictable. Predictable predictions can be leveraged infinitely.

To Better Predict the Future

Now that the data scientists in Numerai’s tournament are focused solely on generalization to the future, we’ve also released a new, human-readable feature to aid building models that are robust through time. The dataset now contains a column with time information that can be used to train models that strive for time-invariance.


Numeraire, our new cryptocurrency to coordinate machine intelligence, will be the final economic incentive layer against overfitting.

Learn more about Numeraire in our Film, Medium post, Smith and Crown, and Wired.

Silicon Doesn’t Sleep. —