Auto annotate images for TensorFlow object detection

Alvaro Leandro Cavalcante Carneiro
Analytics Vidhya
Published in
5 min readDec 24, 2019
https://hackernoon.com/hn-images/1*_ptBWjaV6FgVPgOYd5UZVg.png

This article and library version are outdated, please, refer to my new article to check the new library version: https://medium.com/p/acf410a600b8

Everyone who worked with an object detection problem knows how the process to annotate images is boring and spends time.

Annotate itself it's pretty simple, as the GIF below shows, we just need to mark the object location and tell its class. There is no problem doing it for some images, but when we have hundreds or even thousands of data in our dataset, which is normal in deep learning, it turns into a bottleneck for our work.

A boring and time-spending process of annotating images.

In one of my last works, I needed to create my own dataset of coffee leaves and also label their diseases. At first, I annotated the images by hand in the LabelImage software and trained the model, but I was really unsatisfied with this approach because I was spending all my time creating and annotating the dataset, making it difficult to scale up the project.

So I thought of a simple way to change this process and turn it into a more automatic one.

With this new approach, I’ve created a Python class called generate XML, which is responsible for doing the hard work for me, annotating the images through the inference of a pre-trained model to get the positions of the bounding boxes and creating an XML that is used in the training.

AUTOMATING THE PROCESS

To run this project on your machine you will need to clone the TensorFlow repository on GitHub and install all the dependencies. As it can be a little difficult the first time, here you can find a complete tutorial explaining the step-by-step towards this process.

The auto_annotate project was done in TensorFlow 1.14, but it’s also compatible with the TF 2.x.

Inside the folder research/object_detection is almost everything we need, you can check the notebook called object_detection_tutorial, where they explain in detail the process to load a model and create new inferences.

In my case, I used the auto annotate inside a NodeJS API, but we will create a directory scheme just to show the behavior, and then you can modify it to use anywhere you want.

The scheme has a folder for the whole thing called auto_annotate and inside this, I have the following folders: images, scripts, results, graphs, and xml.

  • Images contain all photos you want to infer and create an XML
  • Results are the result of the inference
  • Script contains all python algorithms that we will use.
  • Graphs are the frozen inference graph and the label map.
  • xml is the folder containing the generated XML files.

You can find the code that I used on my GitHub: https://github.com/AlvaroCavalcante/auto_annotate

The main file is the detection_images.py, responsible to load the frozen model and create new inferences for the images in the folder. You will need to change the first lines to add your own path if it’s necessary. I also added some lines to change the image dimension and save the results, everything else is similar to the original file that you can find in the TensorFlow directory.

The generate_xml.py file receives the name of the inferred class, the image dimension, filename, and an array of dictionaries with the bounding box coordinates.

All this information is passed by the file visualization_utils.py, also found in the TensorFlow directory, we just need to do some adaptations as follows.

array_position = []im_height, im_width, shape = image.shape #get image dimensionsfor box, color in box_to_color_map.items(): #loop in predicted boxesymin, xmin, ymax, xmax = boxdict_position = {'xmin': 0, 'xmax': 0, 'ymin': 0, 'ymax': 0}dict_position['ymin'] = ymin * im_height #add the positions to the #dict, we multiply to get the real value in pixelsdict_position['xmin'] = xmin * im_widthdict_position['ymax'] = ymax * im_heightdict_position['xmax'] = xmax * im_widtharray_position.append(dict_position)

With this setup, we just need to instantiate the class and run the method generate_basic_structure to generate the XML.

if new_xml != False: #this statement prevents to call the class with # we don't have predictions in the image.xml = generate_xml.GenerateXml(array_position, im_width, im_height, class_name, file_name)xml.gerenate_basic_structure()

In this method, we use the ElementTree (that you will need to install from pip) to create an XML structure based on what the LabelImage generates automatically to us, passing the box positions with a for loop and saving the file. Is important to remember that the name of the XML must be the same as the image file inferred.

RUNNING THE SCRIPT

Before running the algorithm you will need to replace the file visualization_utils.py in the TensorFlow folder research/object_detection/utils with the file we have modified (remember they are always adding new changes in the TensorFlow repository, so depending on when are you doing this tutorial if you replace the visualization file from TF by mine this may not work, copy and paste the lines that have changed is safer, but you can try replacing first).

You will also need to copy and paste the file generate_xml.py to the utils folder, in the same location as the visualization_utils.py.

After that, you just need to enter in your folder (auto_annotate for me)and run:

python3 scripts/detection_images.py

If everything is working you will see inside the results folder the inferred images, like this:

My coffee leaves inferred by the model!

And in the XML folder, you will find the files containing the annotations, lets open the image and XML in the LabelImage.

Here we are!!! Labels created automatically

Obviously, it's not perfect yet, you may want to adjust the box positions to be more precise (depending on your problem domain) and will definitely need to create new labels when your pre-trained model doesn't infer correctly, but we can say that it's faster than doing all the work by hand.

As long as we train more our model the inferences are more accurate and the annotating process gets easier.

CONCLUSION

A more automatic way to annotate the dataset is something very important for everyone who works with object detection, allowing us to focus on what really matters instead of wasting time with this. This is just a first step, I hope soon arise new methods and algorithms to facilitate even more this process.

Thanks for reading, I hope this tutorial could help you, please if you find in trouble or have some doubts let me know, I will be happy to help you. :)

--

--

Alvaro Leandro Cavalcante Carneiro
Analytics Vidhya

MSc. Computer Science | Data Engineer. I write about artificial intelligence, deep learning and programming.