Pipelines, Mind Maps and Convolutional Neural Networks
The first time I tried training a convolutional neural network myself, I lost track of what changes made the network better or worse. I was overwhelmed by the number of decisions I had to make and the infinite possibilities that I had to explore.
I now use pipelines to experiment with different scenarios, and mind maps to see what I have tried and what else I can try.
In this article, I will talk about how I use pipelines and mind maps for training convolutional neural networks, using the German Traffic Sign image classification project as an example.
The sections are as follows:
- Exploratory data analysis for pipeline
- Choosing a network architecture
- Experimenting with preprocessing
- Experimenting with network design
- Mind map and continuous improvement
Exploratory Data Analysis for Pipeline
I do the exploratory data analysis solely to come up with a pipeline plan. Let’s quickly look at the process in three steps:
- Understand the project objectives
- Perform data analysis for pipeline ideas
- Prepare train and validation set
The Project Objective
The objective is to classify the traffic sign images from the German Traffic Sign image classification into the predefined classes.
Visualization helps me to intuitively understand what I’m dealing with.
I printed the label and shape for each of the randomly selected images. The images come in different sizes.
Data Analysis for Pipeline Ideas
- The images are in different sizes
- The image brightness is fairly random
- The images may be slightly rotated
- The images may not be facing straight
- The images may not be exactly centered
- The class distribution is skewed
Ideas for Pipeline:
- Resize all images into the same shape
- Image augmentation to compensate for minor classes
- Data normalization
- Experiment with different color spaces
The list doesn’t have to be perfect as I can have as many pipeline objects as I want later on.
This is my pipeline plan. It has two meanings.
First of all, it tells me what I have done and what I need to do. Next, it’s a blue print for the actual Pipeline object that I’m going to build.
Train and Validation Data Split
Out of the 39,209 training images, I reserved 8,000 (20%) for validation. I did this before applying augmentation so that the validation set has original images only.
That’s it for the exploratory data analysis.
Why Not More Exploratory Data Analysis?
Some people may find it odd that I don’t spend more time on the exploratory data analysis.
This is because there is no end to it. If I immediately work on ideas as they pop up, I’m letting random thoughts to control my progress. I could easily spend days without producing any results, which can be very discouraging.
Once I build the first pipeline working, I can experiment more systematically with ideas, and quantitatively measure the effectiveness of changes. This is why I want to finish the initial exploratory data analysis as quick as possible.
Choosing a Network Architecture
I worked on the model first.
How to Choose a Network Architecture
The first choice should be a pre-trained network that works for the same kind of problem (i.e. traffic sign classification). I could use Transfer Learning to reuse the pre-trained model for my project. This would save a lot of time.
The next choice is to find a well-known model built for the similar purpose, and adapt it to my project. A good network architecture is very hard to come up with. So, this can save a lot of time, too.
Build from Scratch
The last resort is to build a network from scratch which can be very time-consuming and risky (i.e., I may not be able to complete the project on time).
My Choice: LeNet
I chose to use LeNet by Yann LeCun. It is a convolutional neural network designed to recognize visual patterns directly from pixel images with minimal preprocessing. It can handle hand-written characters very well.
I adapted LeNet to this project which became the network to be used in the first pipeline.
I used Tensorflow and Scikit-Learn’s Pipeline framework to build pipelines.
First, I wrote a simple class to easily build a convolutional neural network like below:
network1 = (NeuralNetwork()
.input([32, 32, 3])
.conv([5, 5, 6])
.conv([5, 5, 16])
Having such a class was critical for me as I was planning to make many network objects with various configurations.
Once a network object is created, I use Scikit-Learn’s Pipeline framework to make my network object an estimator (more on this later).
Experimenting with Preprocessing
Loading and Resizing Data
I defined a transformer object to handle the image loading and resizing.
Then, I created a new Pipeline object that combined the transformer (loader) and the estimator (network1).
pipeline = make_pipeline(transformer, estimator)
I can put as many transformer objects as I want into a Pipeline object.
pipeline = make_pipeline(trans1, trans2, trans3, estimator)
I can train and evaluate the Pipeline object.
# training with train set
# accuracy with validation set
score = pipeline.score(X_eval, y_eval)
For the first pipeline, I used 5 epochs (50,000 randomly selected samples per epoch) for training.
It is very likely that the network will see the same images more than once, which can cause overfitting. This will be addressed with augmentation later on.
The evaluation score is based on the 8,000 validation set.
0 Train: 0.841 Evaluation: 0.822
1 Train: 0.939 Evaluation: 0.916
2 Train: 0.955 Evaluation: 0.932
3 Train: 0.891 Evaluation: 0.866
4 Train: 0.948 Evaluation: 0.925
This bare-bone pipeline is already performing well. This is encouraging.
It seems to be overfitting a bit but I could probably improve it by adding regularization. Should I do that now?
No, I should not. Not now.
If I change the network while building the preprocessing, it will be harder to see what affects which. This is where I used to lose track of changes and improvements.
I should just note down my observations for later review and move on.
Generating more image variation is important especially for the minor classes.
I wrote the following functions:
- random brightness adjustment
- random small rotation
- random small translation
- random small shear
I tested with the training images:
Then, I made a new Pipeline object for training and validation.
0 Train: 0.199 Evaluation: 0.204
1 Train: 0.464 Evaluation: 0.460
2 Train: 0.574 Evaluation: 0.572
3 Train: 0.633 Evaluation: 0.628
4 Train: 0.674 Evaluation: 0.675
There is no overfitting but the performance is much worse. I might have done the augmentation incorrectly.
This pushed me into thinking: “How much rotation should I have applied? 10 or 15? How much brightness change? etc, etc..”
Can I experiment with different values?
Yes, I can.
It is ok for me to play with the parameters as long as I’m not changing anything in the other boxes of the pipeline plan.
Having said that, I should examine the result more closely. The training accuracy was similar to the validation accuracy. It indicates these two sets are not completely different in nature.
Moreover, the worse performance shouldn’t be surprising since the model needs to learn from much more training images. The augmentation simply exposed the weakness of the pipeline and/or the model.
The reason for the bad performance is somewhere else.
The pipeline objects tested so far did not have any data normalization.
Any machine learning lecturer will tell you that you need to do data normalization before feeding data into your model. But how so?
I wanted to measure how much improvement those normalization can bring.
- x — 127.5
- x — x.mean()
- (x — x.mean())/x.std()
I created a new Pipeline object for each case.
The below is 5th epoch result for each case:
x - 127.5 : 0.790, 0.782
x/127.5 - 1.0 : 0.849, 0.844
x/255.0 - 0.5 : 0.813, 0.812
x - x.mean() : 0.820, 0.820
(x - x.mean())/x.std(): 0.909, 0.900
(x - x.mean())/x.std() is the winner. It’s nice to see the normalization made such a huge difference.
Color Space Conversion
The images are loaded in RGB format but there are many different color spaces.
I created one Pipeline object for each color space. I used OpenCV function (
cvtColor) to convert RGB images into other color spaces.
Note: Gray scale images have only 1 channel, which required the input shape of the input layer to be changed as follows:
gray_net = (NeuralNetwork()
.input([32, 32, 1]) # 1 ch
.conv([5, 5, 6])
.conv([5, 5, 16])
The following is the 5th epoch results:
Gray: Train: 0.903, Validation: 0.901
HSV : Train: 0.785, Validation: 0.777
HLS : Train: 0.770, Validation: 0.768
Lab : Train: 0.848, Validation: 0.844
Luv : Train: 0.844, Validation: 0.838
XYZ : Train: 0.909, Validation: 0.899
Yrb : Train: 0.839, Validation: 0.838
YUV : Train: 0.841, Validation: 0.834
The Gray scale and XYZ were both performing about the same as RGB (no conversion). The rest was worse. So, I decided to continue with RGB.
Ready to Move On
I completed all the boxes in the pipeline plan.
In doing so, I made 4 different Pipeline objects:
I’ll be using the third pipeline to experiment more with the network design.
Experimenting with Network Design
My second network has more filters and neurons.
network2 = NeuralNetwork()
.input([32, 32, 3])
.conv([5, 5, 12]) # doubled
.conv([5, 5, 32]) # doubled
.dense(240) # doubled
The performance is much better than the first network.
0 Train: 0.853 Evaluation: 0.844
1 Train: 0.908 Evaluation: 0.908
2 Train: 0.937 Evaluation: 0.937
3 Train: 0.957 Evaluation: 0.954
4 Train: 0.951 Evaluation: 0.949
I plotted the learning curve.
I do this for almost all networks I tried. It’s insightful to see how the learning curve differs for various network configurations.
As this is a classification problem, it could be useful to check the confusion matrix as well.
The performance is good across the board but some of them are less so. The details for each class is as follows:
I asked questions like:
- Is it overfitting? (Train vs validation accuracy)
- Should I increase the number of epochs? (Learning curve still increasing?)
- Should I change the learning rate? (Learning curve stop increasing?)
- What else can I do?
Network 3, 4, 5, 6, 7, 8, 9, 10 and More
I kept creating new network objects to try out different network configuration.
- More convolutional layers and filters
- More fully connected (dense) layers and neurons
- More epochs (up to 500)
- Smaller learning rate (down to 1.0e-4)
- Smaller weight initialization
- Leaky ReLU activation
- ELU (Exponential Linear Unit) activation
- Switch from AdamOptimizer to MomentumOptimizer
- Use the max pooling before and after the activation
- Balanced class distribution in training images
- And more…
Each network object is trained and evaluated using a pipeline object. I also check the learning curve and confusion matrix for each of them. I noted down my observations for each cases.
For more details, please see the Jupyter notebook on my Github.
Mind Map and Continuous Improvement
As I experimented with more and more ideas, it became harder and harder for me to remember what I had tried.
What changes made the network better or worse?
After I had worked on the project for more than a few days, I was asking this question too often as I had to scroll up and down in my Jupyter notebook to find answers.
Mind Map for the Rescue
I decided to draw a graph of the important things I tried.
I was able to quickly look up what I had tried, and come up with ideas on what else I may try.
As of this writing, I re-looked at the map and found it funny that my brain started telling me that I should’ve also tried this and that. It’s a powerful tool.
Mind Map for Exploration
If an idea is working, I kept expanding on it. It’s like a greedy search algorithm or DFS (depth first search) but it’s more free style than that.
When I scrapped an idea completely, I marked them with “X”. It’s usually because of the bad performance result.
I find it useful to keep track of “T” (training accuracy) and “V” (validation accuracy) next to the network name. I can see how each network was doing and check where overfitting happens.
Mind Map is just a Tool
I don’t need to keep updating the mind map. When I feel a bit lost or need some direction, I update it, look at it and let my mind speak.
I don’t need to make it look beautiful. It’s just a map showing where I am.
I don’t need to put all the experiments in a map. I just put the ones around the most successful path. Sometimes, I shortened it by combining multiple changes. It gives me more space to grow ideas. It’s better to keep the map in one sheet of paper for a quick glance.
It’s a tool not the purpose.
I use it to find information quickly and get some inspiration.
I’ve done with the project but it’s far from the end.
I can still find many scenarios that I haven’t tried in the mind map. I could go back and do more experiments thanks to the pipeline mechanism.
If I want to replace the network with a completely new architecture like GoogLeNet, I can do so by using a new Pipeline object, and draw a new mind map around that network. I’m not stuck with one particular network architecture.
I’m in control with pipelines and mind maps.
That’s it for now. I hope you find it useful.
- German Traffic Sign Benchmarks: http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset
- The PPM format (Portable Pixmap, P6): http://en.wikipedia.org/wiki/Netpbmformat
- LeNet Demo (Yann LeCun): http://yann.lecun.com/exdb/lenet/
- Gradient-Based Learning Applied to Document Recognition (Yann LeCun): http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
- Keras with Scikit-Learn Pipeline: http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
- My Github for German Traffic Sign Classifier: https://github.com/naokishibuya/car-traffic-sign-classification
- Udacity: Self-Driving Car Engineer: Traffic Sign Classifier Project: https://github.com/udacity/CarND-Traffic-Sign-Classifier-Project