A Simple Convolutional Neural Network Summary for Binary Image Classification With Keras.

Alexander Beat
The Startup
Published in
7 min readJul 8, 2020

Convolutional neural networks (CNN’s) are the main deep learning tool to use for image processing. I recently used a CNN for my latest student project here at Flatiron and got to have a look at how they work and how they differ from dense neural networks, in addition to how they perform better when working with images and python. In my project, I was able to classify patient x-ray images to determine whether they had pneumonia or not. There are also many other uses for image processing in the medical field and in other fields of work and study. Next, we’ll try to show the simplest, most basic breakdown of some of these steps so that you can get on your way to building a CNN for image classification with Keras.

Keras makes it very simple to build a neural network and building a CNN will seem pretty familiar compared to that. Everything is pretty similar in how the flow works. There are many more options and paths to take when building the model, along with parameters to tune, but this article is just to show a basic outline of the model. Instantiate the sequential model, add your layers, flatten, output, compile and fit/train. Done. Here are the main imports you’ll need, but know that there will be several others depending on your data prep, the problem you are trying to solve, and what your output will be for the model.

import keras
from keras import layers
from keras import models
from keras import optimizers
from keras.models import Sequential

First, instantiate a sequential model, then add the convolutional layer with this line of code. CNN’s work by scanning over the pixels of an image in small blocks using a filter. This is how the network begins to find specific patterns in the pixels, which then build into more complex patterns as the model trains. The main parameters you’ll want to focus on for this layer will be specifying the number of filters you want to use, the size of the filter and the image input shape. The layer accepts image input shape as three dimensional, so if you have an RGB image, it’ll be three dimensions, but if you have a grayscale image it’ll have a one in that place. See what I mean in the block below. You’ll also want to choose the type of activation function for each of these convolutional layers. In this case for binary classification, relu is very popular and works well since it outputs activations only if they go above 0, so it’s either one or the other in this case.

# Initialising the CNNmodel = Sequential()# Create convolutional layer. There are 3 dimensions for input shapemodel.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape=(64 ,64, 3)))

Your next step will be to add a pooling layer just after your convolutional layer. This is what gives your model power over a dense neural network and better speeds with processing. Maxpooling works well as an example from my model, where it takes the highest pixel value from each different group of pixels as the filter slides across the image. This basically downsizes your image to help the model learn the initial patterns in the image. Those patterns will be built into more complex ones as more filters and layers are added. The main parameters you’ll focus on for maxpooling are pool size and stride. Pool size works very similar to filter size in the convolutional layer which is used to find the patterns, but here, the pooling layer will drastically downsize your image, creating less parameters and less work for your system. Stride is the number of pixels the filter shifts across your image matrix. In Keras, if stride is not specified, it defaults to whatever is set for the pool size. So in the code below, you’ll see a tuple as the parameter and it’s being used for both pool size and stride. Here’s a link to the Keras information so you can read what I mean. https://keras.io/api/layers/pooling_layers/max_pooling2d/

# Pooling layermodel.add(layers.MaxPooling2D((2, 2)))

You’ll find that a lot of models will repeat this alternating step of adding a convolutional layer and then a pooling layer so you’ll have more hidden layers to train with, and achieve better accuracy. A good idea when adding more layers will be to raise the number of filter parameters used as you add more layers so the model will learn more complex patterns in the images. See what I mean in this set of layers below.

# Convolutional layer
model.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (64 ,64, 3)))
# Pooling layer
model.add(layers.MaxPooling2D((2, 2)))
# Adding a second convolutional layer with 64 filters
model.add(layers.Conv2D(64, (3, 3), activation = 'relu', input_shape = (64 ,64, 3)))
# Second pooling layer
model.add(layers.MaxPooling2D((2, 2)))
# Adding a third convolutional layer with 128 filters
model.add(layers.Conv2D(128, (3, 3), activation = 'relu', input_shape = (64 ,64, 3)))
# Third pooling layer
model.add(layers.MaxPooling2D((2, 2)))

The next step will be to flatten the layers and add an output layer. The layers are flattened into a 1-D array for use in the output layers. The final layers will be dense layers, as opposed to the previous convolutional ones we used. In this example the activation function is set to sigmoid on the final output layer. This function is most common for binary classification since it outputs activation values between 0 and 1.

# Flattening
model.add(layers.Flatten())
# Full connection
model.add(layers.Dense(units = 512, activation = 'relu'))
model.add(layers.Dense(units = 1, activation = 'sigmoid'))

Once your layers are set, you’ll compile the model. The optimizer used here is adam and is a type of gradient descent to help increase performance and speed up convergence of the training and validation data as your model trains. With binary classification tasks, set your loss function to binary cross entropy. This will help to calculate the loss for each sample as your model makes predictions. You will also want to state the metric for your model to use. In this case, I used accuracy of whether each image was correctly classified.

# Compiling the CNNmodel.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['acc'])

Next will be fitting the model, and there are several main factors you’ll want to look at for this. First, you’ll enter in your training and label data. If you used an image generator, then that should return a tuple object of the train data and labels, which you can also enter in as a single parameter like shown below. Then, state the number of epochs to run the model for, and enter in the validation set of data for the model to compare accuracy and loss data to.

If you have a huge amount of data, it may be good to also include early stopping so your model doesn’t train for too long. This will also help with overfitting. Use the import below and set up your early stopping to monitor the validation loss, and use min_delta and patience parameters. For min_delta, 0 is very common. If you are unsure what number to use, it’s also a good idea to run your model for a few epochs to see what the validation loss values are coming out as, and then basing your min_delta value on that. I’ve seen some models using other values like 0.1, or others like 0.01, 0.001, etc. This will tell your model to look for a difference in change of validation loss based on that decimal amount, with a patience of however many epochs you state in the patience parameter. So if you state a patience of 3, and your validation loss doesn’t change much in 3 epochs, it will stop your model training early. The callbacks parameter in your model fitting will allow you to use the early stopping.

from keras.callbacks import EarlyStopping
# Define the callbacks for early stopping of model based on val loss change.early_stopping = [EarlyStopping(monitor = 'val_loss', min_delta = 0.01, patience = 3)]
# Fitting the CNNhistory = model.fit(training_set, steps_per_epoch = 500, epochs = 10, callbacks = early_stopping, validation_data = val_set)

Once the model has been trained, you can get your scores from test data predictions. Some ways to see how your model performs would be to use evaluate and enter in your test data and labels. This returns your test loss and test accuracy. Another way that I used would be to make a classification report by comparing the test labels compared to model predictions.

# Prints out test loss and accuracy
results_test = model.evaluate(X_test_data, y_test_labels)
print(results_test)
# Creates a classification report showing your accuracy, recall, f1.
import sklearn.metrics as metrics
y_preds = model.predict_classes(X_test_data).flatten()
metrics.classification_report(y_test_labels, y_pred_labels)

From there, some other possible options for visualizing that you could look into might be to create a confusion matrix or make plots of the loss and accuracy for training and validation data to see how your model performed across the different epochs.

This was more of a basic outline to build a CNN, but there are plenty of other hyperparameters you could look into and ways to modify the model to perform better and prevent overfitting. Along with that, some of the techniques described in this little summary could also be explored much more in depth as to how they work and ways you could approach the situation. For now though, hopefully this quick overview will give you a little introduction to how building a CNN is done and start you on your way to making one yourself.

--

--

Alexander Beat
The Startup

Data scientist. Flatiron grad. Artist converted to tech. Fascinated by technology, space, global culture and history. linkedin.com/in/alexanderbeat