Information Highway Bandits
What slot machines and one-armed bandits in Las Vegas can teach us about how to optimise finance, drug discovery and the web itself
A few years ago I was in Las Vegas to visit the “Consumer Electronics Show”, one of the world’s largest fairs and where the latest electronic products are presented. But what struck me most in the region known for its big casinos, was that you could find slot machines even in the elevators of the hotels! Seeing these “one-armed bandits” made me think of a famous problem from information theory, the “multi-armed bandit problem”.
Here’s how it works. Assume that you are sitting in front of a row of multiple slot machines and that each one is configured differently (meaning that some machines are more likely to produce a winning combination than others). Now, you only have a limited number of money and you want to make as much money as possible over the next couple of hours. Which machine will you play? And will you play the same machine all the time or change once in a while?
Of course, you first need to find out, which machine gives the most rewards. A simple approach to find this out is probably to put an identical number of coins in each machine at the beginning, and then to exclusively use the one that returns the best rewards. Unfortunately, there is no guarantee that you find the best machine right away in the beginning, because if you started with 5 coins each you would have less chances of finding the right one than if you started with 1000 coins each.
It turns out that, no matter how many coins you have, this approach is too rigid, and that an optimal strategy is based on a much more dynamic approach. In other words, you do want to assess which is the best slot machine initially. But then, while playing most coins in this first chosen machine, an ideal strategy consists in continuing to spend a few coins on the other machines, too (especially those one with a high potential). And if during the game it suddenly turns out that another machine is more rewarding, you change your preferential slot machine to that one. This you continue until the end.
The last decades showed that this problem has actually applications in a surprising number of domains. For example, in medicine the slot machines could be experimental drugs that one wants to test, and the goal is to discover which is the most effective drug with the least amount of clinical trials. A more recent example concerns online advertising. It is entirely feasible today to simultaneously roll out multiple version of an online advertising to a subset of users in parallel, measure the results, and then see which one works best!
However, being in this Las Vegas elevator finally also made me realise another thing. No matter how good you optimise your game, the strategy of the “one-armed bandits” (as they are rightly called!), is still better then ours. After all, they’re still there and make quite some money with us — and not the opposite.