Image feature extraction using pretrained models in caffe

Ashish
5 min readJan 8, 2018

Caffe is a deep learning framework that allows researchers and deep learning practitioners build complex deep neural networks and train them without a need to write much of actual code. If you are interested in learning about deep neural networks and caffe, you might find the resources listed in the references section helpful to get started.

Deep Learning and Feature extraction

As a beginner, I recently had to spend a lot of time to properly preprocess and extract features as the caffenet models require specific format when we feed the images to the trained models for feature extraction. This blog post is a result one of a few hurdles I had to experience during my thesis work while at Boise State University. In this post, I will talk about how to transform images for feature extraction and do the actual feature extraction in Caffe.

Feature extraction basically involves following things:

  1. an input image (or a batch of images) — ofcourse!
  2. pre-trained model (.caffemodel) — binary file that stores weights, biases and gradients for each layer of the network (here for more details)
  3. model definition (.prototxt) file that contains structure of the network being used
  4. target feature extraction layer name (for eg. “fc7” in Alexnet)

Let’s directly jump into how to achieve feature extraction in caffe and write some code. I am using python wrapper for Caffe. Make sure caffe is installed before proceeding.

We start with the necessary imports.

% python # Imports 
import caffe
import pickle
import numpy as np

To make it easier to explain the process, I will assume that we want to extract feature from fc7 layer of Alexnet pretrained model. The pretrained model (bvlc_alexnet.caffemodel) and model definition (deploy.prototxt) files for Alexnet can be found here. Notice input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } } in deploy.prototxt file, the first dimension order is batch_size. So, we will use the batch_size of 10 for feature extraction.

% python # required only if working in gpu mode gpu_id = <gpu_id> extract_from_layer = “fc7” input_images_file = “/path/to/file/containing/image-paths” model_def= “/path/to/deploy.prototxt” pretrained_model=”/path/to/pretrained-model.caffemodel” # output file to write extracted features to disk output_pkl_file_name = “path/to/output-file.pkl” # change based on your deploy.prototxt file 
batch_size = 10

input_images_file is a path to a file that contains image paths. Paths are to be specified in file as below:

/path/to/image1 /path/to/image2 ...

We parse input_images_file file line by line and load image paths as a list.

% python ext_file = open(input_exp_file, 'r') image_paths_list = [line.strip() for line in ext_file] ext_file.close()

Optionally, if you are using GPUs for computation, set the execution mode to gpu.

% python # set gpu mode 
caffe.set_mode_gpu()
caffe.set_device(gpu_id)

We then load images using a caffe.io module. caffe.io internally uses skimage.io.imread that represents images as values between 0 and 1 of H x W x C shape dimension in RGB. Where, H, W and C are height, width and number of channel, respectively, of an image.

% python images_loaded_by_caffe = [caffe.io.load_image(im) for im in image_paths_list]

However, certain caffe models like CaffeNet and AlexNet expect input images to have:

  1. values between [0, 255]
  2. BGR channel order
  3. CxHxW order of input blob dimensions

So next we transform loaded images to have the expected channel orders and values.

% python # set up transformer - creates transformer object 
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
# transpose image from HxWxC to CxHxW transformer.set_transpose('data', (2,0,1)) # swap image channels from RGB to BGR transformer.set_channel_swap('data', (2,1,0)) # set raw_scale = 255 to multiply with the values loaded with caffe.io.load_image
transformer.set_raw_scale('data', 255)

Now we are set to feed our images to the network to extract features. Let’s initialize caffe.Net object that frames the network by combining the network architecture (model_def) with the pretrained weights (pretrained_model). Since, we already have the pre-trained weights (we can also use weights that are trained from scratch – nothing changes), we create a caffe.Net instance in caffe.TEST mode. We use caffe.TEST mode to either predict the class of an image (in classification problem) or to extract features.

% python # Create a net object 
net = caffe.Net(model_def, pretrained_model, caffe.TEST)

“The net is a set of layers connected in a computation graph — a directed acyclic graph (DAG) to be exact.”

The utility function get_this_batch provided below iterates over the full list of images and produces a batch of size batch_size. Feature extraction is not so computationally expensive because all its doing is one-pass matrix multiplication between the transformed input and stored weights and adding the product with the bias, output of which is then passed through a non-linear layers and all other tasks defined in the model definition. Still, choose batch_size wisely as we do not want to perform computation on batch of images that donot fit in memory.

% python# returns batch of image of size "batch_size" 
def get_this_batch(image_list, batch_index, batch_size):
start_index = batch_index * batch_size
next_batch_size = batch_size
image_list_size = len(image_list)
# batches might not be evenly divided
if(start_index + batch_size > image_list_size):
reamaining_size_at_last_index = image_list_size - start_index
next_batch_size = reamaining_size_at_last_index
batch_index_indices = range(start_index, start_index+next_batch_size,1)
return image_list[batch_index_indices]

Following for loop take one batch at a time, extract features passing it through a network, copy features from the network blobs and then finally store to a list.

% python  
total_batch_nums = len(images_loaded_by_caffe)/batch_size
features_all_images = []
images_loaded_by_caffe = np.array(images_loaded_by_caffe)
# loop through all the batches for j in range(total_batch_nums+1):
image_batch_to_process = get_this_batch(images_loaded_by_caffe, j, batch_size)
num_images_being_processed = len(image_batch_to_process)
data_blob_index = range(num_images_being_processed)
# note that each batch is passed through a transformer
# before passing to data layer
net.blobs['data'].data[data_blob_index] = [transformer.preprocess('data', img) for img in image_batch_to_process]

# BEWARE: blobs arrays are overwritten
res = net.forward()
# actual batch feature extraction
features_for_this_batch = net.blobs[extract_from_layer].data[data_blob_index].copy()
features_all_images.extend(features_for_this_batch)

Now that we have features, we wrap these features to a binarized pickle file and write it to disk.

% python pkl_object = {"filename": image_paths_list, "features": features_all_images} 
output = open(output_pkl_file_name, 'wb')
pickle.dump(pkl_object, output, 2)
output.close()

I hope this helps and here is the complete code used in this article. I’ve also listed few helpful references that helped me through my learning process.

Thank you. :-)

References

  1. caffe.Classifier() vs caffe.Net() for feature extraction
  2. caffe.io and example
  3. Stanford’s cs231n class
  4. Andrej Karparthy’s Youtube channel and blogs on deep learning (here and here)
  5. Step-by-Step guide to Caffe 1
  6. Step-by-Step guide to Caffe 2
  7. Intro to Deep Learning with Caffe and python
  8. And, lot more… Good luck ;-)

Originally published at sharmaashish.com.

--

--

Ashish

Data Science, Machine Learning and Big Data. #DataScience #DeepLearning #AI