Neural Networks 101: Introduction to Keras (Part 1)

Carson Bentley
The School of AI (official)
5 min readMay 16, 2019

Artificial Intelligence is all the buzz… and not without substance! From self driving cars to improvements in medical research, AI is changing the way we organize our businesses and society. One algorithm in particular — the artificial neural network — stands at the center of a recent set of technology breakthroughs. However, getting started coding neural networks can be an intimidating challenge. With all the different coding libraries available, you might be wondering where to begin.

Keras is a high level Python library for coding neural networks that’s easy to get started with.

First developed in 2015 by Francois Chollet, Keras has risen in popularity and is now supported as the official high level API for Tensorflow. The main advantage to using a high level API like Keras, is that it requires you to write only a few lines of code in order to define and train a model.

Today, we will be teaching our neural network to learn an XOR (exclusive or) type relationship. Imagine trying to separate a 2x2 checkerboard into red and black with one line. This type of problem cannot be solved by drawing a single line to separate the classes. Since a single layer ‘perceptron’ is only capable of linear classification, we will use a two layer neural network.

Be sure to check out the Colab notebook provided with this article. You can run a code-cell by selecting it and pressing shift+enter on the keyboard, or by clicking the play button top left of the cell.

Let’s start by importing Keras and checking the version.

import keras
keras.__version__

You should see a string such as ‘2.2.4’. Putting a version check at the top of a notebook is good practice to encourage reproducibility of results.

Next, we’ll import Numpy for it’s convenient math functions as well as matplotlib.pyplot to display a graphical result.

import numpy as np
import matplotlib.pyplot as plt

Let’s create a toy data set in the shape of a 2 X 2 checkerboard. Each of the four clusters will contain 100 pairs of (x, y) coordinates.

north_east = np.random.randn(100, 2) + 2
south_west = np.random.randn(100, 2) — 2
class_a = np.concatenate([north_east, south_west], 0)north_west_x = np.random.randn(100, 1) — 2
north_west_y = np.random.randn(100, 1) + 2
north_west = np.concatenate([north_west_x, north_west_y], 1)
south_east_x = np.random.randn(100, 1) + 2
south_east_y = np.random.randn(100, 1) — 2
south_east = np.concatenate([south_east_x, south_east_y], 1)
class_b = np.concatenate([north_west, south_east])

We can display a 2D scatter plot using the syntax ‘plt.scatter(x, y)’ with an optional argument ‘c’ for color.

plt.scatter(class_a[:,0],class_a[:,1], c=’red’)
plt.scatter(class_b[:,0],class_b[:,1], c=’blue’)
plt.show()

To generate our labels efficiently, we can make use of the np.ones and np.zeros functions provided by numpy. These allow us to quickly generate arrays of ones or zeros in any shape we might need. (note: since our outputs are binary, we won’t need to use one hot encoding or a softmax activation)

labels = np.concatenate([np.ones((200, 1)), np.zeros((200, 1))])
print(labels[:5], "...\n")
print(labels[-5:])

Next we’ll shuffle the data. This is an important step to get right. If our x and y data are contained in separate arrays, it can be useful to generate a set of shared indices.

train_data = np.concatenate([class_a, class_b])
print(“before shuffle:\n”, train_data[:5])
#indices for shuffling the data (input) and the labels (output)
indices = np.array(range(400))
np.random.shuffle(indices)
print(“shuffle indices:\n”, indices[:10], “…”)
#sort our data
train_data = train_data[indices]
print(“after shuffle:\n”, train_data[:5])
train_labels = labels[indices].astype(‘int’)

Our inputs are 400 pairs of (x, y) coordinates. Our outputs are 400 labels each with a value of either 1 or 0.

print(train_data.shape, train_labels.shape)

We can use the Sequential() function to define our model.

model = keras.Sequential()

To assemble a series of layers, we can use the model.add() function. This expects a layer object as an argument.

Densely connected layers, also known as fully-connected layers, are a general purpose unit. Each layer is defined by how many nodes are contained, along with the activation function to be applied after the layer. The ‘sigmoid’ activation function assures that we receive our final answer as a value between 0 and 1. Also note that the first layer contains an argument for the shape of the input.

model.add(keras.layers.Dense(4, activation=’relu’, input_shape=(2,)))
model.add(keras.layers.Dense(6, activation=’relu’))
model.add(keras.layers.Dense(1, activation=’sigmoid’))

For the optimizer, we’ll use ‘adam’, which is a form of gradient descent that makes use of momentum on the learning rate to ride out difficult terrain in the loss landscape. Other options include ‘sgd’ (stochastic gradient descent), as well as ‘rmsprop’. You can find a full list of Keras optimizers here.

The loss function of ‘binary cross entropy’ is specialized for binary classification problems.

model.compile(optimizer=’adam’, loss=’binary_crossentropy’)

To train our model, we use the fit function.

An epoch is a cycle through the entirety of the training data.

Batch size is the number of samples we use before updating our model. Powers of two (16, 32, 64, 128) are assigned to batch size for memory efficiency. Increasing batch size makes the model take longer to train.

model.fit(train_data, train_labels, epochs=200, batch_size=32)

To test our model, we can pass in some new data.

test_data = np.array([[2, 2], [-2,-2], [-2, 2], [2, -2]])predictions = model.predict(test_data)
print(predictions)

To get the predictions as binary values, we’ll simply round to the nearest whole number with np.around().

predictions_binary = np.around(predictions).reshape(4,)
print(predictions_binary)

As you can see from the plot below, our model has correctly separated our classes into a checkerboard configuration.

plt.scatter(test_data[:,0], test_data[:,1], c=predictions_binary)

As an exercise, see if you can produce a larger batch of points for testing, and graph the results.

hint: np.random.randn(N, 2) will give you ’N’ random 2D vectors from a standard normal distribution.

This article part of an effort to fill in gaps for the free online course Data Lit, taught by Siraj Raval, Colin Skow, Sathish Ravichandran, Kurt Koo, and myself.

--

--