How to Guess Accurately 3 Lottery Numbers Out of 6 using LSTM Model

9 min readMay 27, 2022

Motivation

In order to predict at least 3 lottery numbers out of 6 (variable y) lottery numbers in an Israeli general lottery game, I chose the Israeli general lottery games dataset that was sourced from Mifal HaPais. This dataset was based on the results of Israeli general lottery games between September 1968 and May 2022. It has many characteristics of learning, and the dataset can be downloaded from here.

Artificial Neural Networks (Multilayer Perceptrons)

Artificial Neural Networks can also be referred to as Multilayer perceptrons. The perceptron model was created in 1958 by American psychologist Frank Rosenblatt. Its singular nature allows it to adapt to basic binary patterns through a series of inputs, simulating the learning patterns of a human-brain. A Multilayer perceptron is the classic neural network model consisting of more than 2 layers.

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) were invented to be used around predicting sequences. LSTM (Long short-term memory) is a popular RNN algorithm with many possible use cases:

When to use:

One to one: a single input mapped to a single output.
e.g — Image Classification
One to many: a single input mapped to a sequence of outputs.
e.g — Image captioning (multiple words from a single image)
Many to one: A sequence of inputs produces a single output.
e.g — Sentiment Analysis (binary output from multiple words)
Many to many: A sequence of inputs produces a sequence of outputs.
e.g — Video classification (splitting the video into frames and labeling each frame separately)

Business Understanding

The Israeli general lottery game called Loto. Lotto is a weekly game where the participant chooses 6 numbers out of 37 and an additional one number out of 7. Mifal HaPais draws 6 numbers of 37 and 1 number of 7, and the maximum prize is paid for matching all of them.

Various options allow the user to bet double, play random numbers, 5 numbers plus a random number, or all combinations of 7–12 numbers. The drawings are held once on Tuesday and once on Saturday, with occasional drawings on Thursday.

The prize pool is a minimum of ₪5,000,000 and a maximum of ₪80,000,000.

Data Understanding

First, let’s import the relevant libraries and packages:

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Bidirectional, Dropout

Second, Let’s load the required dataset using pandas’s read CSV function.

df = pd.read_csv(“IsraeliLottery.csv”)

We can take a closer look at the data took help of “head()”function of pandas library which returns first five observations.

df.head()

Similarly “tail()” returns last five observations.

df.tail()

The dataset provides the lottery results information. It includes 4,047 records and 8 fields.

print(df.shape)

(4047, 8)

After we have loaded the dataset, we might want to know a little bit more about it. We can check attributes names and datatypes using info().

data.info()

No variable column has null/missing values.

The describe() function in pandas is convenient in getting various summary statistics. This function returns the count, mean, standard deviation, minimum and maximum values and the quantiles of the data.

df.describe()

The count, mean, min and max rows are self-explanatory. The std shows the standard deviation, and the 25%, 50% and 75% rows show the corresponding percentiles.

Data Processing

There are several features that we do not need, such as “Game” and “Date”, so, we will drop them.

df.drop([‘Game’, ‘Date’], axis=1, inplace=True)
df.head()

Deep learning algorithms expect all input features to vary in a similar way, and ideally to have a mean of 0, and a variance of 1. We must re-scale our data so that it fulfills these requirements.

scaler = StandardScaler().fit(df.values)transformed_dataset = scaler.transform(df.values)transformed_df = pd.DataFrame(data=transformed_dataset, index=df.index)

Let’s check out our scaled data:

transformed_df.head()

Now, let’s define several variables:

# All our games
number_of_rows = df.values.shape[0]
number_of_rows

4047

# Amount of games we need to take into consideration for prediction
window_length = 7
window_length

# Balls counts
number_of_features = df.values.shape[1]
number_of_features

Next, let’s crate X and y for each row in our scaled data. It should have format for Keras LSTM model (rows, window size, balls)

X = np.empty([ number_of_rows — window_length, window_length, number_of_features], dtype=float)y = np.empty([ number_of_rows - window_length, number_of_features], dtype=float)for i in range(0, number_of_rows-window_length):
    X[i] = transformed_df.iloc[i : i+window_length, 0 : number_of_features]
    y[i] = transformed_df.iloc[i+window_length : i+window_length+1, 0 : number_of_features]

Let’s check out X shape:

X.shape

(4040, 7, 6)

Let’s check out y shape:

y.shape

(4040, 6)

Lets’ check out our first scaled sample (which made of 7 consecutive lottery games):

X[0]

Lets’ check out our first scaled label:

y[0]

Lets’ check out the second scaled sample (which made of 7 consecutive lottery games):

X[1]

Lets’ check out our second scaled label:

y[1]

Modeling

First, let’s initialise the RNN

model = Sequential()

Let’s add the input layer and the LSTM layer

model.add(Bidirectional(LSTM(240, input_shape = (window_length, number_of_features), return_sequences = True)))

Let’s add a first Dropout layer in order to reduce overfitting

model.add(Dropout(0.2))

Let’s add a second LSTM layer

model.add(Bidirectional(LSTM(240, input_shape = (window_length, number_of_features), return_sequences = True)))

Let’s add a second Dropout layer

model.add(Dropout(0.2))

Then, let’s add a third LSTM layer

model.add(Bidirectional(LSTM(240, input_shape = (window_length, number_of_features), return_sequences = True)))

Now, let’s add a fourth LSTM layer

model.add(Bidirectional(LSTM(240, input_shape = (window_length, number_of_features), return_sequences = False)))

Next, let’s add a dense layer

model.add(Dense(59))

Finally, let’s add the last output layer

model.add(Dense(number_of_features))

Now, let's compile the RNN

from tensorflow import keras
from tensorflow.keras.optimizers import Adammodel.compile(optimizer=Adam(learning_rate=0.0001), loss ='mse', metrics=['accuracy'])

Next, let’s train our LSTM model

model.fit(x=X, y=y, batch_size=100, epochs=300, verbose=2)

Evaluation

Let’s take the results of the last 8 Israeli general lottery games:

to_predict = df.tail(8)
to_predict

Let's remove the last raw from the 8 last games

to_predict.drop([to_predict.index[-1]],axis=0, inplace=True)to_predict

We got exactly the 7 last games before the May 24th, 2022 lottety game.

Now, let’s take the results of the May 24th, 2022 lottety game and place it into a variable called “prediction”

prediction = df.tail(1)
prediction

Next, we have to change the format of our last 7 games from dataframe to np.array in order to insert them into our model

to_predict = np.array(to_predict)
to_predict

Then, we have to re-scale those 7 games

scaled_to_predict = scaler.transform(to_predict)
scaled_to_predict

Now, let’s predict the results (i.e., the 6 numbers) of the May 24th, 2022 lottety game based on those 7 games:

y_pred = model.predict(np.array([scaled_to_predict]))print(“The predicted numbers in the last lottery game are:”, scaler.inverse_transform(y_pred).astype(int)[0])

The predicted numbers in the last lottery game are: [ 5 9 12 20 23 35]

Let’s see what were the real results of the May 24th, 2022 lottety game:

prediction = np.array(prediction)print(“The actual numbers in the last lottery game were:”, prediction[0])

The actual numbers in the last lottery game were: [ 6 10 13 20 23 35]

3 number sout of 6 numbers, not bad at all!!! especially considering the fact that there was not supposed to be a model within the data, that is, the numbers had to be 100% random.

Your Turn!

Hopefully, this post gives you a good idea of what a deep learning sequential project looks like. As you can see, much of the work is in the data wrangling and the preparation steps, and these procedures consume most of the time spent on deep learning.

Now it’s time to get out there and start exploring and cleaning your data. Try two or three algorithms, and let me know how it goes.

Source code that created this post can be found here. I would be pleased to receive feedback or questions on any of the above.

About the Author

Roi Polanitzer, CFV, QFV, FEM, F.IL.A.V.F.A., FRM, CRM, PDS, is a well-known authority in Israel the field of business valuation and has written hundreds of papers that articulate many of the concepts used in modern business valuation around the world. Mr. Polanitzer is the Owner and Chief Appraiser of Intrinsic Value — Independent Business Appraisers, a business valuation firm headquartered in Rishon LeZion, Israel. He is also the Owner and Chief Data Scientist of Prediction Consultants, a consulting firm that specializes in advanced analysis and model development.

Over more than 17 years, he has performed valuation engagements for mergers and acquisitions, purchase price allocation (PPA) valuations, goodwill impairment test valuations, embedded option and real option valuations, employee stock option (ESOP) valuations, common stock valuations (409A), splitting equity components and complicated equity/liability instrument valuations (PWERM / CCM / OPM), contingent liability, guarantees and loan valuations, independent expert opinions for litigation purposes, damage quantifications, balancing resources between spouses due to divorce proceedings and many other kinds of business valuations. Mr. Polanitzer has testified in courts and tribunals across the country and from time to time participates in mediation proceedings between spouses.

Mr. Polanitzer holds an undergraduate degree in economics and a graduate degree in business administration, majoring in finance, both from the Ben-Gurion University of the Negev. He is a Full Actuary (Fellow), a Corporate Finance Valuator (CFV), a Quantitative Finance Valuator (QFV) and a Financial and Economic Modeler (FEM) from the Israel Association of Valuators and Financial Actuaries (IAVFA). Mr. Polanitzer is the Founder of the IAVFA and currently serves as its chairman.

Mr. Polanitzer’s professional recognitions include being designated a Financial Risk Manager (FRM) by the Global Association of Risk Professionals (GARP), a Certified Risk Manager (CRM) by the Israel Association of Risk Managers (IARM), as well as being designated a Python Data Analyst (PDA), a Machine Learning Specialist (MLS), an Accredited in Deep Learning (ADL) and a Professional Data Scientist (PDS) by the Professional Data Scientists’ Israel Association (PDSIA). Mr. Polanitzer is the Founder of the PDSIA and currently serves as its CEO.

He is the editor of IAVFA’s weekly newsletter since its inception (primarily for the professional appraisal community in Israel).

Mr. Polanitzer develops and teaches business valuation professional trainings and courses for the Israel Association of Valuators and Financial Actuaries, and frequently speaks on business valuation at professional meetings and conferences in Israel. He also developed IAVFA’s certification programs in the field of valuation and he is responsible for writing the IAVFA’s statement of financial valuation standards.