Over the last few months I’ve worked as an intern at Numerai. If you haven’t heard of Numerai, it’s a hedge fund that is innovating far beyond the status quo of the financial industry. We run a data science tournament where anyone in the world can download our training data and try to predict the future. If they do a good job, we will trade real world financial assets on their predictions, and pay them for their work in our own proprietary cryptocurrency. Numerai is tapping into a diverse and talented network of data scientists to build a multifaceted artificial intelligence, monopolizing the “wisdom of the crowd”. The only natural cap on this idea is the number of data scientists in the world.
But some of this is easier said than done. For starters, how do we decide which data scientists to pay? And then how much should each participant be paid? Answering these questions is of the utmost importance to Numerai. The game theory that underpins the tournament determines the quality of the machine learning models people use. Last month, I and several others came to the realization that our previous tournament structure, though strong in certain aspects, was not perfectly incentivizing the data scientists to build the best models they possibly could. Without getting into the specifics, users were discovering that they could maximize profit by using statistically bad machine learning models. In some tournaments, one or two users won all of the prize pool, even though they hadn’t submitted particularly good predictions. This had to change. Fast forward to today, and we have implemented a new tournament that we believe vastly improves the emergent properties of users’ staking behavior. It is designed to be more fair for the users, and to generate better data for our meta-model. So how does this game theory work? How can you, dear user, choose the best p (probability parameter) to maximize earnings?
Before reading further, if you are not familiar with the new tournament, or have not participated in the Numerai competition before, I would recommend checking out the rules here. It may also be good to read up on expected value if you are new to the concept.
Without further ado:
There are two main factors when considering how to play the tournament optimally. Firstly, a user will want to make sure their chosen p is less than or equal to their actual expected probability of beating the log loss threshold, denoted p-hat. This is because we have designed the payment structure such that the expected value of any bet is break even at p = p-hat. The higher a user sets p the worse odds they will get in the game, so the higher percentage of the time they will have to beat the threshold to be profitable.
According to our rules, when a user beats the log loss threshold they can expect to make ((1-p)/p)*stake when they win. For instance, if a user succeeds 50% of the time, and they choose p=0.9, they stand to win(1–0.9)*stake/(0.9) = (1/9)*stake. When they lose we burn their entire stake. So the user expects to win (1/9)*stake 50% of the time and to lose (1/1)*stake 50% of the time. Long run their expected value is (-0.444)*stake per bet. This is no good, and playing this way a user will not be profitable. At p = p-hat long run expected value comes out to zero. When p is lower than p-hat expected value is greater than zero. To visualize this:
The region where expected value is positive is shaded. Remember that p and p-hat must be between 0 and 1. Here we can see that with the profit equation (1-p/p), break even is p=p-hat. p≤p-hat is required to be profitable over time.
So why don’t users just set their p’s as low as possible? Wouldn’t this maximize profit? Actually, this is not game theory optimal. The lower p is set, the less likely a user will be eligible for the prize pool. We payout the highest p’s first, so it is undesirable to be below the eligibility-cutoff p. Additionally, there is no actual benefit in underreporting p. Because we payout everyone in the prize pool at the same p value, and the lowest p value we can, underreporting p will not improve the p at which one is paid. If a user would be in the prize pool at p=0.7, and at 0.6, and we end up paying everyone in the prize pool out at a p of 0.55, then choosing 0.6 provides no benefit. However, increase the chance that the user is not in the prize pool at the time of payout, because there may be many submissions with higher p’s above them on the payout list. Taking the risk to use a p lower than p-hat is not offset by the potential reward, because it can only ever, at best, marginally improve the p used in payouts, but can greatly jeopardize one’s chances of being in the prize pool. Game theoretical optimal play can be achieved by setting p as close as possible to the real probability of beating the threshold, p-hat.
It is worth noting that a participant with a sufficiently large stake can be guaranteed to deplete the prize pool at any p they chose. To maximize expected profit, it seems that they may occasionally want to choose a p lower than p-hat. However, a participant with equal funds and an equally strong model could come in above them and completely exclude them from the prize pool. Due to this adversarial dynamic, every participant, regardless of capital resources, is incentivized to make the best model possible, and then report it as honestly as possible. Thus game theory optimal play is achieved when p is the closest approximation to p-hat possible.
We hope this is as fun to play with as it was to design!