Predicting the Mega Millions with Gaussian Naïve Bayes

5 min readJul 1, 2019

Some time ago, when I was both poor, had too much spare time, and the Mega Millions lottery payout was at some historic high, I tried looking at whether there were any trends to be gleaned.

After scraping the site, lo and behold, there was something interesting going on. Not enough to give up work and become a professional gambler, but something worth looking into.

Well, Data.gov has a data set of Mega Millions winning numbers form the New York State Lottery going back almost a decade and the oddities are still there.

For example, have a look at the plot of just one ball (of six):

Besides the change in range of numbers changing, the distribution isn’t uniform. Substantially so.

Looking at this histograms for each, they’re almost normally distributed, indicating that the random algorithm—or a basket full of ping-pong balls—isn’t all that random.

It doesn’t mean that numbers will be easy to guess but maybe there’s something you can do with it.

So just to start out, I wanted to look at probabilities as a baseline, because right there alone could be a decent measure at guessing numbers.

For the first ball, there’s a 9% chance of getting a 2, which is substantial considering an even distribution might be 1/46 (different balls have different ranges), or 2%.

I set up a primitive train/test split for this baseline where I choose the most common probabilities as well as one where I select a random sampler weighted on those probabilities (np.random.choice).

Just choosing top probabilities easily trumped the random sampler. Results showed buying tickets for the top 20 most likely numbers for each ball would ensure getting a match.

When compounded over the permutations, that’s still a lot of money to spend (n! /(n — r)!), but it’s much better than 40+ numbers, or 78 million+ permutations.

Gaussian Naive Bayes to the Rescue

I tried a few different models just to see if anything might work in this situation, maybe by sheer luck. But linear regressions and time series appeared to be substantially worse than the baseline.

But after thinking about it more, this is the sort of thing that Bayesian analysis is meant for. We have some probabilities, we have historic values, and it’s not going to be linear, or logarithmic. I have little additional information so I need a naïve approach. Plus the distributions are kind of Gaussian.

Maybe decision trees could work as well if future values are based on previous ones, or maybe the NSA has something to reverse engineer random number generation used in encryption, but I went with Bayes.

With Gaussian naive Bayes, it assumes a Gaussian (normal) distribution and then uses the formula for the probabilities of a normal distribution based on the mean and variance and then using the Bayesian equations for probabilities: P(L | features)=P(features | L)P(L)P(features)

On a lark I defined the independent variables as a moving window of previous values, with the dependent target as the next value for each ball. Possibly there is some interaction (or pseudo-interaction) between the balls, but based on the histograms, they appeared completely unrelated to each other.

Maybe there is something wrong with this and that it might somehow skew the independence of my variables, but let’s give it a shot.

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(X, y).predict(X)

It also takes a priors parameter to specify weighted probabilities for classes, which seems relevant to what I’m going for, and that’s something I may come back to, but for now I went with the default features.

And simply with that, I was getting a 15% accuracy of predictions on the train set. For other modeling situations, that might be terrible, especially on just the train set, but this is a very erratic set that would otherwise need Fourier transforms. And winning the lottery 15% of the time would be pretty nice to have.

What’s different from the Bayesian prediction rather than our baseline is that, with the baseline, you can choose the top 10 or so probabilities. Bayesian model only predicts one number. So there is no way to hedge your bets by buying multiple tickets. So it would be nice to get the model as accurate as possible.

Looping through a train-test split, it looks like certain balls have much worse accuracy than others. Changing the size of the window would make the prediction change, although not much. I accounted for this by adjusting the training window. Ball #1 worked best with a window of 100 numbers, #4 worked with 400.

In the end, the accuracy wasn’t bad. For certain numbers it was very good. A substantial improvement on the baseline model:

Test Set Accuracy (training window)Ball #1 0.6 (100)
Ball #2 0.25 (7)
Ball #3 1.0 (800)
Ball #4 0.35 (500)
Ball #5 0.35 (30)
Mega Ball 0.45 (1200)

That’s for guessing each ball. But we want to have multiple wins in each play. So how much are we winning on each ticket?

One time, it guessed every number plus the Megaball! Maybe we’re on to something. Well, if I get rich, I’ll make sure my life won’t change too much.

Predicting the Mega Millions with Gaussian Naïve Bayes

Written by Llewellyn Jones