A deeper look into SqueezeDet on Keras
This is part two of our blog posts on the SqueezeDet objection detection architecture. We highly recommend reading part one and going through the tutorial for the KITTI dataset. We promised you some more juicy details on the implementation and here they are. While you do not necessarily need to know this, it surely helps if you ever decide to implement some model in Keras yourself or change the implementation to fit your needs. If you want to know more about the mathematics behind SqueezeDet, check out the original paper. Here we show you how to set up things for your own dataset and elucidate a couple of implementation details.
I don’t want to change anything, I just want it to run on my dataset!
The most important step, if you want to run SqueezeDet on your own dataset, is to adjust the anchor sizes. As we said before, think of these as a kind of prior distribution over what shapes your boxes should have. The better this fits to the true distribution of boxes, the faster and easier your training will be. How do we determine these shapes?
First, load all ground truth boxes and pictures, and if your images do not have all the same size, normalize their height and width by the images’ height and width. All images will be normalized before being fed to the network, so we need to do the same to the bounding boxes and consequently, the anchors.
Second, perform a clustering on these normalized boxes. You can just use the good old k-means without feature whitening and determine the number of clusters either by eyeballing or by using the elbow method. Sklearn has a nice implementation with tutorials. Since we are already at it, you can also do some sanity checks on your data, if you haven’t done so already. Check for boxes that extend beyond the image or have a zero to negative width or height.
If you are satisfied, add the cluster centroids of these new shapes for the anchors to your squeeze.config file or change them in the create_config.py, so that all future configs have these new shapes.
Image format and sizes
SqueezeDet’s convolutions and poolings are setup in such a way that images with a height of 1248 and a width of 384, result in a grid of 78x 24. Thus, if you change the size of the images, the number of grids changes accordingly.
If your images generally have a different format, you can change the ratios easily, if you keep the total numbers the same. An example would be a vertical format, for example documents, with a height of 768 and a width of 624, resulting in a grid of 48 x 39.
If your images are smaller, we recommend stretching to default size. You can also upsample them beforehand. We tried a smaller image size with the resulting sparser grid, and it did not seem to work out.
If your images are bigger and you are not satisfied with the results of the default image size, you can try using a denser grid, as details might get lost during the downscaling.
If you want anything more fancy, you would have to change the architecture.
As for the actual training, we recommend starting with a small batch size of 1,2 or 4. A small batch size entails a high stochastic noise, which makes escaping a initial bad local minimum easier. At least to some theories. You can also try different learning rates, if the default of 0.01 does not work. Meaningful rates are usually between 0.1 and 0.0001. Additionally, use the provided ImageNet weights. Even if your domain is completely different, training will go way smoother. If any readers have any clue of exactly why this works, please let us know in the comments. It cannot be just the filters in the earlier layers, as this works even if you classify black and white images into two classes. Even there imagenet weights make the network converge faster.
I want to change the code, what do I have to look out for?
Separate Evaluation script
One of the first things you might notice is that training and evaluation have been split into two scripts. The first, train.py, performs the actual training and saves a checkpoint after each epoch. The second one, eval.py, periodically looks for new checkpoints and evaluates metrics and losses on the validaton set. Why might you want to do this?
The first reason is that currently, Keras does not support the evaluation of data generators in the Tensorboard Callback. You could write yourself a custom callback, but additionally, the training process is halted until all Callbacks are sequentially finished. Thus, if you have complicated metrics during evaluation, your training may be delayed quite some time. Another reason might be, that you want to fully utilize your GPUs. While you can just run the training on multiple GPUs, this comes with an overhead. Splitting things up avoids this, as there is no direct interaction between training and evaluation needed, and it also enables you to run things easily after each other.
In the training script, the loading is not done by the native Keras model.load function, but by a custom one. If you take a peek inside, the function goes through all the weights and biases, checks the number of axis and loads only the minimum overlap between your model’s weights and the savefile’s ones. This enables you to load the weights of a model with a slightly different architecture. This could mean more filters in a convolutional layer or less classes in the prediction layer. Now, instead a training everything from scratch after a small tweak, you can reuse most of the costly obtained weights you already have.
How to add things to tensorboard
A big chunk of the eval.py script consists of creating TensorFlow variables, placeholders and assign operations. At first glance, this seems a little overboard. Let’s say you want to add new variable written to TensorBoard. Intuitively you might do something like this:
You open TensorBoard and things look perfectly fine. But if you run this,
you see that your graph has way more operations than expected:
This is because at every iteration, a new assign operation is created and every single one gets stored inside the TensorFlow graph. If you check the memory consumption, it will increase until no memory is available. Here is the proper way:
If at one point you want to write you own variables to TensorBoard, for example inside a custom callback, remember where the variables and operations are created. We hope this helps you with your own experiments.