Part 1 of 3: Data Preparation — Transfer learning using tensorflow’s object detection model on Mac

Vivi E
Coinmonks
6 min readJun 22, 2018

--

This guide is highly inspired by Dat Tran’s article. I also mostly used existing tools made by other people to accomplish this task. Before deciding to follow this guide, please also note that I used macOS High Sierra when doing this, and at least two versions of python. I advise that you fix the virtual environment settings on your local computer.

Step 1. Download images from the web

I downloaded several rat images using a google chrome extension named Fatkun Batch Download Image. This extension lets you choose a google chrome tab (or tabs) from which it will download the loaded images from. It is also important to know that the images can be saved with only one format, in our case jpeg, and that the images are named uniformly. It is important for us to save it in jpeg format for the labelling step.

Make sure to have a google chrome tab which loads the desired object to be detected. In this case, rats.

Figure 1. Google image search on rats

Click the Fatkun Batch Download Image button on the upper right part of your google chrome browser.

Figure 2. Fatkun Batch Download Google Extension Icon

Check out More Options button and make sure to toggle the Rename based on radio button choice so that you would not have to deal with images with same filenames, or unwanted character codes.

You can also toggle images on or off depending if you want it downloaded or not.

Figure 3. Toggle images on or off

Make sure that Ask where to save each file before downloading option is turned off so that saving of the images will be automated.

Figure 4. Edit save settings

Save images and check the output folder which will contain the images. Folder name will be something like rat _ Google Search.

Note: You may also get your images from your own videos as long as you extract the frames in preparation for the next step — labelling.

Step 2. Label Images

For labelling the downloaded images, I used LabelImg. Please note that avoid possible problems, make sure that your images are saved with jpeg format.

Clone LabelImg repository. In my case, I used a Python 3.5.5 virtual environment and followed the instructions for building here for my laptop with macOS High Sierra. If you are interested to learn more about setting up virtual environments on your Mac, here’s a helpful guide.

# This is a comment
# Build for macOS High Sierra
cd build-tools
chmod +x ./build-for-macos.sh
./build-for-macos.sh

After building the project on my local machine, I proceeded to pip install the needed libraries. You can also opt to religiously follow the instructions from the original repository .

# These are chosen parts from Python 3 Virtualenv + Binarypip install py2app
pip install PyQt5 lxml
make qt5py3
rm -rf build dist

After the above steps, I invoked LabelImg by running python labelimg.py

Figure 5. labelImg

Now you can proceed to labelling the images. The Open Dir feature is very helpful since we have all our images in one folder. I also used the Change Save Dir feature to keep all my output in one place. Also make sure that your save format is PscalVOC.

Figure 6. Choose save format

Continue to label everything until you get an output similar to this

Figure 7. LabelImg output

Step 3. Split data to training, and testing sets

Before splitting the data, I needed to have a single csv file containing all the labels. For this part, I used the xml_to_csv.py from here. First, I had to put all my images on raccoon_dataset/images, all xml files on raccoon_dataset/annotations, and empty the raccoon_dataset/data. The raccoon_dataset/data will contain the output files. You can edit this as you please but make sure to do the necessary changes on raccoon_dataset/xml_to_csv.py.

Figure 8. Annotations

Do the necessary changes on the raccoon_dataset/xml_to_csv.py and create your csv file by runningpython xml_to_csv.py. Your output will look something like the picture below

Figure 9. Sample output of xml_to_csv.py

For splitting the data, I used the split labels.ipynb from here. Make sure to read each cell block and edit the hard coded variables to fit your needs. If you are not familiar with the UI, make sure to read more about jupyter notebooks and how to set it up on your local machine.

Figure 10. Dat Tran’s split labels

After running the split labels.ipynb successfully, you should have something like the picture below. For my data, I split it to 80–20 for training, and testing respectively.

Figure 11. Sample output of `split labels.ipynb`

Step 4. Convert images to TFRecord

At this point, your raccoon_dataset/data should look contain the following files written below. This means that we now have the testing and training labels, and we are now ready to convert the data to TFRecord format.

rat_labels.csv
test_labels.csv
train_labels.csv

Update PYTHONPATH

To do this, I used raccoon_dataset/generate_tfrecord.py made by Dat Tran and published here (THANK YOU, MASTER). Please note that you need Tensorflow’s object detection library to be able to accomplish this part. If you are working on Mac (High Sierra), follow this setup guide that I used.

Once your PYTHONPATH includes tensorflow object detection libraries, we can now start with the conversion.

Prepare label map

Edit the label map part of raccoon_dataset/generate_tfrecord.py accordingly

# TO-DO replace this with label map
def class_text_to_int(row_label):
if row_label == 'rat':
return 1
else:
None

Generate TFRecord files

Here’s a sample usage provided in raccoon_dataset/generate_tfrecord.py:

Usage:
# From tensorflow/models/
# Create train data:
python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=train.record
# Create test data:
python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=test.record

And lastly, to make sure that you correctly produced your tfrecord converted files, check the total size of your tfrecord files and compare it with the total file size of your image files. They should be about the same size.

Sources

Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News

Also, Read

--

--