Detecting fake banknotes using TensorFlow

TensorFlow is an open source library built by Google, widely used in the field of machine learning and deep learning. The library is popular for its use of data-flow graphs to carry out numeric computations.

Today, let’s use TensorFlow to build an artificial neural network that detects fake banknotes.

The dataset

Our dataset is a CSV file that contains information extracted from (wavelet transformed) images of banknotes. There are 1,372 banknotes, each with the following attributes:

  1. Image.Var (Variance of Wavelet Transformed image (WTI))
  2. Image.Skew (Skewness of WTI)
  3. Image.Curt (Curtosis of WTI)
  4. Entropy (Entropy of image)
  5. Class (Whether or not the banknote was authentic)

Let’s see how we can explore this data using Pandas and Seaborn, and make predictions from it using TensorFlow.

Importing the dataset

Firstly, let’s import the necessary Python libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

Next, let’s import the bank notes CSV file and store it in a Pandas dataframe called bank_notes. We can get the dimensions of the dataset using .shape.

bank_notes = pd.read_csv('bank_note_data.csv')
bank_notes.shape
Output:
(1372, 5)

We can also use the .head().info(), and .describe() methods to learn more about our data.

bank_notes.head()
The first 5 rows of our dataset.
bank_notes.info()
Basic information about each column in our dataset.
bank_notes.describe()
More information about the numeric columns in our dataset.

Exploring the dataset

Now that we’ve imported our data, let’s plot some graphs to see how our data is distributed. First, we can use Seaborn’s Countplot to see how many fake and real banknotes there are in the dataset.

sns.countplot(x='Class', data=bank_notes)
Countplot of banknotes in each class.

It seems like we have a lot more fake banknotes in our dataset.

Next, let’s try to find relationships between the other attributes in our dataset (in relation to our target class). We can use a Pairplot from Seaborn, with the hue set to the Class attribute. This way we can easily see how the relationships differ between real and fake banknotes.

sns.pairplot(data=bank_notes, hue='Class')
Pairplot of all attributes, with the hue set to the target class.

Data preparation

When using neural network and deep learning-based systems, it’s usually a good idea to standardise our data. We don’t need to standardise the Class attribute, so let’s create a separate dataframe to store the other features.

bank_notes_without_class = bank_notes.drop('Class', axis=1)

Next, let’s fit a StandardScaler object from the Scikit-learn library on the independent variables and store the transformed data in a new dataframe called scaled_features.

scaler = StandardScaler()
scaler.fit(bank_notes_without_class)
scaled_features = pd.DataFrame(data=scaler.transform(bank_notes_without_class), columns=bank_notes_without_class.columns)

We can take a look at the scaled features by calling .head().

The first 5 rows of our scaled features dataset.

Also, since we have 2 classes (authentic and forged) for our dependent variable, we can separate these into two different columns. Let’s rename Class to Authentic, and create a new Forged column.

# Rename 'Class' to 'Authentic'
bank_notes = bank_notes.rename(columns={'Class': 'Authentic'})
# 'Forged'
bank_notes.loc[bank_notes['Authentic'] == 0, 'Forged'] = 1
bank_notes.loc[bank_notes['Authentic'] == 1, 'Forged'] = 0

Independent and dependent variables

Our X will be the scaled features, and our y will be both the Authentic and Forged attributes. Since Numpy arrays are compatible with TensorFlow, we can convert X and y into Numpy arrays using .as_matrix().

# X and y
X = scaled_features
y = bank_notes[['Authentic', 'Forged']]
# Convert X and y to Numpy arrays
X = X.as_matrix()
y = y.as_matrix()

Training data and test data

Now that we have our independent and dependent variables, let’s use Scikit-learn’s train_test_split to split our data into a training and a test set. We will use 20% of the original dataset for testing.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Let’s also print out the shapes of X_train, X_test, y_train, and y_test. It will help us when defining parameters for our neural network.

TRAINING SET SHAPES
X_train : (1097, 4)
y_train : (1097, 2)

TEST SET SHAPES
X_test : (275, 4)
y_test : (275, 2)

Parameters

Before setting up our neural network, it is important to define the parameters for our model. We may want to adjust these a bit later on, depending on how our model performs. Let’s first set the learning rate, the number of training epochs, and the batch size.

The learning rate of the model is a value between 0 and 1. It can be thought of as a measure of how quickly our model abandons old beliefs for new ones. A high rate means that the network changes its mind more quickly, and a lower rate means that it is reluctant to change. Here we will choose a learning rate of 0.01.

One epoch means one pass of the training set. We want our model to go through the training set more than just once, to improve accuracy. However it is important to note that a very high number of epochs results in the risk of overfitting. Overfitting reduces the performance of our neural net on unseen data. Let’s set the number of training epochs to 100.

Finally, we can set the batch size to 100. We will be using batch learning, and a batch size of 100 means that we will update our weights using back-propagation after every 100 predictions.

learning_rate = 0.01
training_epochs = 100
batch_size = 100

It is also important to set the parameters for our network, and not just for the training. This includes the number of nodes for each layer in our model (namely the input layer, the hidden layer(s), and the output layer).

n_hidden_1 = 4 # # nodes in first hidden layer
n_hidden_2 = 4 # # nodes in second hidden layer
n_input = 4 # input shape
n_classes = 2 # total classes (authentic / forged)
n_samples = X_train.shape[0] # # samples

TensorFlow graph input

Now that we’ve defined our parameters, let’s define the inputs we will feed into our TensorFlow graph. x and y can be defined as matrix (or tensor) placeholders.

x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
Note: ‘None’ means that the first dimension can be of any size.

Weights and biases

Next, we need to define the weights and bias for each layer in our network. We will create dictionaries of weights and biases using the parameters we’ve already defined.

weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}

Our network will have 3 layers (2 hidden layers and an output layer, excluding the input layer).

The structure of our artificial neural network.

We can set the predictions to be a tensor called preds, and it will contain the output from our neural network.

preds = multilayer_perceptron(x, weights, biases)

Cost and optimisation

Let’s use a softmax cross-entropy function for calculating the loss, and the adam optimiser to minimise cost.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=preds))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Constructing our neural network

We’re finally ready to set up our neural network! We will create a function that accepts the input x, a dictionary of weights, and a dictionary of biases. Let’s use the ReLU activation function for each layer.

Functions for each hidden layer of our network.
def multilayer_perceptron(x, weights, biases):

'''
x: Placeholder for data input
weights: Dictionary of weights
biases: Dictionary of biases

'''


layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)

layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)

out_layer = tf.matmul(layer_2, weights['out'] + biases['out'])

return out_layer

We have finally set up the data flow graph. It is now time to train our model!

Training the network

In TensorFlow, graphs aren’t executed unless a Session is created and run. The session allocates resources for the graph, and holds the actual values of intermediate results and variables.

Let’s have two loops:

  1. The outer loop runs the epochs, and
  2. The inner loop runs the batches for each epoch.

After each epoch, we can print out the cost and append it to a list of costs. The way we can plot a line graph after training to visualise how our cost has been minimized.

sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
costs = []
for epoch in range(training_epochs):
    avg_cost = 0.0
total_batch = int(n_samples/batch_size)
    for batch in range(total_batch):
        batch_x = X_train[batch*batch_size : (1+batch)*batch_size]
batch_y = y_train[batch*batch_size : (1+batch)*batch_size]
        _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
        avg_cost += c / total_batch

print("Epoch: {} cost={:.4f}".format(epoch+1,avg_cost))
costs.append(avg_cost)

print("Model has completed {} epochs of training.".format(training_epochs))

Here’s the output:

Output:
Epoch: 1 cost=1.0476
Epoch: 2 cost=0.6022
Epoch: 3 cost=0.4642
...
Epoch: 98 cost=0.0009
Epoch: 99 cost=0.0008
Epoch: 100 cost=0.0008
Model has completed 100 epochs of training.

Below is a graph of the cost over time, created using the list of costs.

Loss over time for our neural network.

Model evaluation

Our model has now been trained! To see how well it performs on the test set, let’s count the number of correct predictions on the test set. We can then define the accuracy as the mean percentage of correct predictions.

correct_predictions = tf.cast(tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1)), tf.float32)

To get the accuracy, we have to use the .eval() method and pass in a dictionary for the placeholders x and y.

accuracy = tf.reduce_mean(correct_predictions)
print("Accuracy:", accuracy.eval(feed_dict={x: X_test, y: y_test}))
Output: 1.0

Wow, it looks like our model has achieved a 100% accuracy with the test set! Maybe the dataset was a little easy for our model to classify. Although we performed well, it’s important to take a step back and think about what may have caused an accuracy this high.

Comparing models

Since our neural network was pretty much spot on with its predictions, it’s important that we compare it with another model for a reality check.

We will use a random forest classifier. Let’s train it on the same dataset, and store the predictions in a separate dataframe called preds_rfc.

rfc = RandomForestClassifier(n_estimators=10) 
rfc.fit(X_train, y_train)
preds_rfc = rfc.predict(X_test)

Next, let’s evaluate our predictions using a classification report and a confusion matrix.

print(classification_report(y_test, preds_rfc))
Classification report for our random forest classifier.
# Get only the 'Forged' column values from y_test and preds_rfc
y_test_forged = [item[1] for item in y_test]
preds_rfc_forged = [item[1] for item in preds_rfc]
# Print confusion matrix
print(confusion_matrix(y_test_forged, preds_rfc_forged))
Output:
[[125 2]
[ 1 147]]

The random forest classifier was able to achieve a 99% accuracy (only 1% lower than our neural net), so it’s safe to conclude that our dataset was probably just easy to classify.

References

Links to the primary sources I used are linked below:

  1. Dataset obtained from: UCI Banknote Authentication