Displaying Image Contents in Google Earth with Machine Learning Keras Library in Python

This is the third post in the series about developing a script that uses your photos to create a workable KML that shows where the photos were taken on Google Earth. The first one talked about the script basics and the second one introduced reverse geocoding.

The generated KML still has a problem that the name of the photo is just a file path for it.

I wanted to rectify that since it looks ugly on Google Earth. To do that, I needed to know what was in each image. The process where you have half a thousand photos and need to rename them properly can be tedious. A Machine Learning algorithm is better suited for the task. The solution is called image classification and in the case I’m going to describe, it uses convolutional neural network on a pre-trained dataset from ImageNet to make things simple. In the background it does operations on an array of values to figure out what’s in an image, which is an array of pixels in itself. Brandon Rohrer explains how convolutional neural networks work if you want to learn more about it.

I attended an information science university. Parts of my curriculum were focused on machine learning so the things that today’s industry considers hot are pretty much more of the same, just with more resources.

Now, I cannot have what Google has at its disposal in terms of the dataset and I don’t want the script to be overly complex for an ordinary user to use. The usual process is making a tagged dataset to train your model, but I wanted to have a pre-trained model so people didn’t need to think about it, there was no need for fine tuning and they could get it out quick and dirty. Fortunately, there’s a way today.

Enter Keras. Keras is a machine learning library written in Python. It uses several back-ends, but by default it relies on Google’s own TensorFlow library. Installation is simple as it just relies on installing the tensorflow and keras libraries on your system:

pip install tensorflow keras

You don’t have to do this because it’s covered in the script requirements. Chances are it will work out of the box, and that’s what I’m aiming for, but if you want to leverage your hardware resources, introduce GPU support if you have an Nvidia card, or Intel optimization for TensorFlow on Linux systems. You can check out the instructions for installing those yourself. What it usually involves is installing the tensorflow-gpu package or wheels with pip from Intel themselves. The instructions go beyond the scope of this article, which simply aims to provide the means for quick tagging of photos where the speed is not that critical and the user usually has modest resource capabilities.

Keras already has an access to the pre-trained models and the first time you’re running an evaluation on the image, it downloads the model and puts it in a hidden keras directory in your home. This script uses ResNet50 application, whose model is around 100 MB and is trained on the ImageNet dataset.

When it classifies each image, the results are displayed as a certainty estimate and a guess from the algorithm, ordered from the best to worst. What happens then is that the first two estimates are taken together and separated by a “/” character. This is what ends up as the name on the placemark.

For the moment I am happy with the resulting script. Reverse geocoding and machine learning have shaped it up nicely.

The image classification results are not going to be perfect, but it should definitely save you quite a lot of time having the image names prepopulated with terms. You can then error correct manually those that you don’t find accurate if you want.

Implementing the ResNet50 Keras application was very simple in the end and is good enough without fine-tuning.

Then again… The categories could be automatically translated as well. So I included TextBlob and powered up the automatic translation. You just run:

python geotag-gallery.py --folder=/absolute/path/to/the/image/folder/ --language=hr

The language parameter is optional. It will default to English if you don’t put in anything.

Beware, though. TextBlob is not a robust solution since it’s using a public facing 3rd party API and you might experience HTTP error 503 depending on Google’s whims so you might be better off not using that feature since it’s not guaranteed to work and is experimental at best.

So there you have it. Like I said before, you can download it from the repository. Pull requests are always welcome.


Originally published at www.offsetlab.net.