Using Caffe with your own dataset

What a time to be alive! We have a lot of tutorials for Tensorflow, Keras, Torch, even Caffe, but most of them use standard datasets as MNIST or IMDB comments. Couple of years ago I was figuring out how to use my own heap of pictures with Caffe but, actually, on the Internet there are still a few tutorials on this topic. So I think it’s not bad idea to write another one step-by-step and more detailed guide about:

  1. Dataset preparation
  2. Caffe network files preparation
  3. From scratch training and finetuning

Preparing dataset

Usual situation looks like this — you have pictures downloaded from Internet, or from other source. Let’s say we solve classification problem and I hope you at least saved your different class pictures to different folders. Almost the same structure has Stanford Dogs Breeds Dataset. You can download it from http://vision.stanford.edu/aditya86/ImageNetDogs/.

Afghan hound

Let’s don’t rely on train/test split from the website and build our own. For further Caffe dataset creation we will need two files: train.txt and val.txt. They will contain paths to images and class number from train and test data respectively. Part of file can look like following:

...
63_n02101006-Gordon_setter138.jpg 63
37_n02095314-wire-haired_fox_terrier75.jpg 37
16_n02089973-English_foxhound133.jpg 16
67_n02102177-Welsh_springer_spaniel105.jpg 67
32_n02093859-Kerry_blue_terrier28.jpg 32
48_n02097298-Scotch_terrier137.jpg 48
25_n02091831-Saluki52.jpg 25
....

For example you can do it with my old script (not that good, but works tho)

Images moving to one folder and train.txt, val.txt generation

So, let’s assume now we have our dataset in a folder and we have train.txt and val.txt with corresponding paths and labels.

Converting to LMDB

To feed Caffe with large images dataset it’s good choice to use LMDB format for our dataset. We already have an example of a script in Caffe folder (I suppose you have Caffe built on your machine) here caffe/examples/imagenet/create_imagenet.sh.

We need to change following things:

  • EXAMPLE=examples/dogs : where we are going to store LMDB
  • DATA=data/dogs/dogs_data : folder with dogs train.txt, val.txt
  • TRAIN_DATA_ROOT : folder with train images
  • VAL_DATA_ROOT : folder with test images (with script above it’s same folder)
  • RESIZE=true : we need to resize all photos to same size
  • And following piece of code:
GLOG_logtostderr=1 $TOOLS/convert_imageset \
— resize_height=$RESIZE_HEIGHT \
— resize_width=$RESIZE_WIDTH \
— shuffle \
$TRAIN_DATA_ROOT \
$DATA/train.txt \
$EXAMPLE/dogs_train_lmdb
echo “Creating val lmdb…”
GLOG_logtostderr=1 $TOOLS/convert_imageset \
— resize_height=$RESIZE_HEIGHT \
— resize_width=$RESIZE_WIDTH \
— shuffle \
$VAL_DATA_ROOT \
$DATA/val.txt \
$EXAMPLE/dogs_val_lmdb
  • Above we set $DATA/train.txt, $DATA/val.txt and $EXAMPLE/dogs_train_lmdb, $EXAMPLE/dogs_val_lmdb

You can also use (you will need it for some Caffe prototxt’s) make_mean.sh to generate mean file from input images (for further substraction in preprocessing step)

Preparation network architecture

Let’s open network architecture files in caffe/models. We should change bold things in the next snippet: size of images (from RESIZE=true) and paths to LMDB images (from $EXAMPLE/dogs_train_lmdb, $EXAMPLE/dogs_val_lmdb) in file train_val.prototxt. In our case, let’s try with GoogleNet.

name: "GoogleNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "examples/dogs/dogs_train_lmdb"
batch_size: 32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "examples/dogs/dogs_val_lmdb"
batch_size: 40
backend: LMDB
}
}

Next, in the same time we need to change number of classes in last, fully-connected layer (for GoogleNet there are 3 of them), 120 in our case. In the same file we look for next part:

layer {
name: "loss1/classifier"
type: "InnerProduct"
bottom: "loss1/fc"
top: "loss1/classifier"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 120
weight_filler {
type: "xavier"
std: 0.0009765625
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss1/loss"
type: "SoftmaxWithLoss"
bottom: "loss1/classifier"
bottom: "label"
top: "loss1/loss1"
loss_weight: 0.3
}
layer {
name: "loss1/top-1"
type: "Accuracy"
bottom: "loss1/classifier"
bottom: "label"
top: "loss1/top-1"
include {
phase: TEST
}
}
layer {
name: "loss1/top-5"
type: "Accuracy"
bottom: "loss1/classifier"
bottom: "label"
top: "loss1/top-5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}

Here you can read more about other parameters in prototxt file: http://caffe.berkeleyvision.org/tutorial/layers.html. After you can change parameters of training a network (learning rate, weight decay and so on) in models/bvlc_googlenet/quick_solver.prototxt.

If you’re going to finetune your network, Caffe supports it in very easy way. You just have to rename layer names, which weights you don’t want to save from pretrained network, but want to learn from random initialization:

layer {
name: "loss1/loss"
type: "SoftmaxWithLoss"
bottom: "loss1/classifier_dog"
bottom: "label"
top: "loss1/loss1"
loss_weight: 0.3
}

You can do the same for all other layers, maybe you want to learn from scratch also last convolutional layers, not only fully-connected.

Training and finetuning

For training a network from scratch we already prepared everything we need. Following line from ./caffe will run your training process.

./build/tools/caffe train -solver models/bvlc_googlenet/quick_solver.prototxt -gpu 0

For finetuning let’s say from GoogleNet trained on ImageNet dataset, first, you have to download weights http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel here. And after that place them into models/bvlc_googlenet/.

To run finetuning process slightly change the line from above like this:

./build/tools/caffe train -solver models/bvlc_googlenet/quick_solver.prototxt -weights models/bvlc_googlenet/bvlc_googlenet.caffemodel -gpu 0

After all, you have to achieve 70–75% accuracy in a short time.

Thank you for attention, I hope this small guide was useful for you. As you can see, you already have all things needed in a box with Caffe — you just have to fix basic config files and scripts. In you have some questions or comments — I’ll be glad to talk in comments :)

P.S.
Follow me also in Facebook for AI articles that are too short for Medium, Instagram for personal stuff and Linkedin!