Photo Geolocation with Neural Networks: How to and How not to

Noufal Samsudin
Analytics Vidhya
Published in
4 min readAug 7, 2021

Building a neural network that can geotag an outdoor image and how to catch a cheating neural network with grad-cam

Convolutional Neural Networks (CNN) are currently the state-of-the-art in computer vision. In this article I discuss my experience building a CNN model geotag an image — take an image as input and predict the location of that image as output. I also discuss a previous iteration of the model which fooled me into believing the model had become crazy good at geolocating, but it turned out it was a case of data leakage.

The model was trained on a dataset of google streetview images. I scraped images of random locations in India for generating this dataset. The model is reasonably good in making predictions. It generally predicts in the vicinity of the actual location.

Here is a quick view of the results for the impatient — The shaded area indicates grids with high scores and the predicted location is the weighted average of the grid centroid and the score:

Geo-location model evaluation — The shaded area indicates grids with high scores and the predicted location is the weighted average of the grid centroid and the score

These are handpicked good examples. Even when the model’s predicted location is wrong, the predicted grids are reasonable:

Forgivable mistakes — The shaded area indicates grids with high scores and the predicted location is the weighted average of the grid centroid and the score

Catching a cheat

My first iteration of the model gave me very good results. The accuracy hit 60% which was incredible for this approach and far above the benchmarks noted in similar researches. This piqued my suspicion and I dug in further.

Upon further investigation, I found that the predictions were very accurate even for bad quality images.

Good prediction for bad quality images — very suspicious

This was very suspicious because there were no signs of overfitting in the loss curves. I then decided to look into the layer activations in the neural network using grad-cam approach to see what the model is looking at when making these decisions. And long behold, the scam was busted!!

How the model cheats

The model was looking at the name of the uploader in the google image, and using that to predict the location. Very sneaky. Thanks to gradcam we can now shine light on the “black box”.

When I masked the names on the right bottom corner. The model just went haywire. Gotcha!

Prediction accuracy dramatically reduced when the name of the photographer was masked

In later iterations, I cropped out the bottom portion of the image. The model accuracy was not as great, but it was able to pick up some general patterns like landscape, buildings, vegetation, roads, terrain etc. when making the prediction.

Model re-trained on images with photographer names masked to prevent cheating

Modeling Approach:

I framed this as a multi-class classification problem. I split the map of India into grids and trained the model with the grid number as the target variable.

Dataset Preparation

I overlayed an isometric grid onto the map of India. The resulting grids where the target variables the model needed to predict.

I then uniformly sampled points in each grid, used google’s streetview API to get the nearest location with a streetview image, and grabbed 4 images from the 360 view at angles 0,90,180 and 270 degrees.

Data preparation — random sampling location coordinates from grids and scraping for images

Modeling

Model Architecture: ResNext50

Number of classes: 58

The coordinates of the final prediction is calculated as the weighted sum of the centroids of the predicted probability distribution.

Evaluation

Method: Group KFold — 10 splits — grouped by location

Confusion Matrix:

Confusion Matrix on Evaluation Dataset

Average Accuracy: 25%

GitHub Repo: https://github.com/kvsnoufal/ImageGeoLocation

Shoulders of giants

  1. PlaNet geolocation with Convolutional Neural Networks — https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45488.pdf
  2. DeepGeo: Photo Localization with Deep Neural Network — https://arxiv.org/abs/1810.03077
  3. GradCam on ResNext: https://www.kaggle.com/skylord/grad-cam-on-resnext

About The Author

I work in Dubai Holding, UAE as a data scientist. You can reach out to me at kvsnoufal@gmail.com or https://www.linkedin.com/in/kvsnoufal/

--

--