Photo Geolocation with Neural Networks: How to and How not to
Building a neural network that can geotag an outdoor image and how to catch a cheating neural network with grad-cam
Convolutional Neural Networks (CNN) are currently the state-of-the-art in computer vision. In this article I discuss my experience building a CNN model geotag an image — take an image as input and predict the location of that image as output. I also discuss a previous iteration of the model which fooled me into believing the model had become crazy good at geolocating, but it turned out it was a case of data leakage.
The model was trained on a dataset of google streetview images. I scraped images of random locations in India for generating this dataset. The model is reasonably good in making predictions. It generally predicts in the vicinity of the actual location.
Here is a quick view of the results for the impatient — The shaded area indicates grids with high scores and the predicted location is the weighted average of the grid centroid and the score:
These are handpicked good examples. Even when the model’s predicted location is wrong, the predicted grids are reasonable:
Catching a cheat
My first iteration of the model gave me very good results. The accuracy hit 60% which was incredible for this approach and far above the benchmarks noted in similar researches. This piqued my suspicion and I dug in further.
Upon further investigation, I found that the predictions were very accurate even for bad quality images.
This was very suspicious because there were no signs of overfitting in the loss curves. I then decided to look into the layer activations in the neural network using grad-cam approach to see what the model is looking at when making these decisions. And long behold, the scam was busted!!
The model was looking at the name of the uploader in the google image, and using that to predict the location. Very sneaky. Thanks to gradcam we can now shine light on the “black box”.
When I masked the names on the right bottom corner. The model just went haywire. Gotcha!
In later iterations, I cropped out the bottom portion of the image. The model accuracy was not as great, but it was able to pick up some general patterns like landscape, buildings, vegetation, roads, terrain etc. when making the prediction.
Modeling Approach:
I framed this as a multi-class classification problem. I split the map of India into grids and trained the model with the grid number as the target variable.
Dataset Preparation
I overlayed an isometric grid onto the map of India. The resulting grids where the target variables the model needed to predict.
I then uniformly sampled points in each grid, used google’s streetview API to get the nearest location with a streetview image, and grabbed 4 images from the 360 view at angles 0,90,180 and 270 degrees.
Modeling
Model Architecture: ResNext50
Number of classes: 58
The coordinates of the final prediction is calculated as the weighted sum of the centroids of the predicted probability distribution.
Evaluation
Method: Group KFold — 10 splits — grouped by location
Confusion Matrix:
Average Accuracy: 25%
GitHub Repo: https://github.com/kvsnoufal/ImageGeoLocation
Shoulders of giants
- PlaNet geolocation with Convolutional Neural Networks — https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45488.pdf
- DeepGeo: Photo Localization with Deep Neural Network — https://arxiv.org/abs/1810.03077
- GradCam on ResNext: https://www.kaggle.com/skylord/grad-cam-on-resnext
About The Author
I work in Dubai Holding, UAE as a data scientist. You can reach out to me at kvsnoufal@gmail.com or https://www.linkedin.com/in/kvsnoufal/