Object Localization using Keras
In this blog, I will explain the task of object localization and how to implement a CNN based architecture that solves this task.
Resources:
code: https://colab.research.google.com/drive/169pJ-xECBWDW9Q92naaNE3oRyBr7D-uh#scrollTo=IjFHABNA7nZL
images: https://drive.google.com/drive/folders/1YsxDywjUZi-l_YgzXt__9__eUU-I_6EH?usp=sharing
Agenda
- What is Object Localization?
- How to perform Object Localization using Deep Neural Networks
- step-by-step Keras implementation
Object Localization
Object localization is the name of the task of “classification with localization”. Namely, given an image, classify the object that appears in it, and find its location in the image, usually by using a bounding-box. In Object Localization, only a single object can appear in the image. If more than one object can appear, the task is called “Object Detection”.
How to perform Object Localization using DNNs?
Object Localization can be treated as a regression problem - predicting a continuous value, such as a weight or a salary. For instance, we can represent our output (a bounding-box) as a tuple of size 4, as follows:
(x,y, height, width)
- (x,y): the coordination of the left-top corner of the bounding box
- height: the height of the bounding box
- width: the width of the bounding box
Hence, we can use a CNN architecture that will output 4 numbers, but what will be our architecture?
First, we will notice that these 4 numbers have some constraints: (x.y) must be inside the image and so do x+width and y+height. We will scale the image width and height to be 1.0. To make sure that the CNN outputs will be in the range [0,1], we will use the sigmoid activation layer- it will enforce that (x,y) will be inside the image, but not necessarily x+width and y+height. This property will be learned by the network during the training process.
suggested architecture:
What about the loss function? the output of a sigmoid can be treated as probabilistic values, and therefore we can use binary_crossentropy loss.
Usually, this loss is being used with binary values as the ground-truth ({0,1}), but it doesn’t have to be this way- we can use values from [0,1]. For our use, the ground-truth values are indeed in the range [0,1], since it represents location inside an image and dimensions.
Step-by-Step Keras implementation
We will solve an Object Localization task in three steps:
1) Synthetic usecase- Detect white blobs on a black background
2) semi-synthetic usecase- Detect cats on a black background
3) final usecase- Detect cats on a natural background
- Synthetic usecase:
first of all, we will import and define the followings:
Then, we will define our CNN based model. We will perform transfer learning: using a pre-trained version of VGG.
Our next step will be creating a dataset using a python generator. It will create batches of white blobs over black background.
Now, we can train the model to predict the ground truth bounding-box, and visualize its performance:
2. Semi-synthetic usecase:
For this usecase, we will use the same architecture, with a different type of images - images of a cat over a black background. This will be achieved using a different generator:
The output of the above is:
Now, we can define a similar model, train it and examine its results:
Our trained model is capable of locating a cat in black background images:
3. Final usecase
In this usecase, we will draw our cat over natural backgrounds. Similarly to the 2nd usecase, we will need to implement a new image generator - each image will consist of a random-sized cat in a random position. The cat will be drawn over a natural image, that will be its background. The natural image will be drawn randomly from a set of images.
The natural_cat_gen() output will be similar to:
Next, we will define a new model, train it and visualize some of its results: