[Week5-6 - Where is this, in Ankara?]

Şule Alp
Image Based Geolocalization: Ankara
5 min readDec 23, 2017

We managed to finish our dataset and finally got some satisfying results with our model. But before we get into that we had promised heavy details on our data collection processes. We should also mention that during the end of week 5 our work was so akwardly fragmented that it made no sense to publish a work that was not quite finished and wouldn’t mean anything on its own without being done. As we were very close to finishing the project as whole we decided to have a meaningful 2 weeks worth of our work posted rather than one that was quite weak by nature and one that was stuffed with information. So first the promised details:

We had previously set our goal of finishing the dataset and we managed to do just that with some slight delay on our goal time. Being two people in this we had divided our data collection in half; however after awhile we noticed that we were falling behind compared to other groups so we made different divide entirely in order to pipeline the workload to reach results faster. We evaluated our individual strenghts and weaknesses and realized that Alper was more efficient and proficient at data collection, while Şule clearly had better understanding of the task and how to utilize it as she had taken Computer Vision before whereas Alper had not thus lacked the confidence. Alper was also much more willing to do the tedious work of data collecting, he was more patient with it and yielded more results than Şule had done up until this shift in the division of labor. First Şule used a very small supset of the collected data to get her started and later this data was increased notably when the data collection was “complete”. This complete dataset was very unpleasing as we had originally limited ourselves with a maximum picture count per location so there were many mistakes as a consequence; such as a skyline shot of Anıtkabir could not be recognized as there were no such shots in our training set. After this point Alper became dedicated to collect greatly more data than before, giving a tremendous amount of time to more than triple our dataset. Some locations’ picture count could not be increased online anymore so he went to some of the few places that were lacking in picture count and took pictures manually to increase that count.

The end result was that we had 3591 pictures in our total dataset (test and train combined). Also our original data set that had 100 different locations were increased to 120 locations with some of them going up from 12 pictures to as high as 89 pictures. This is despite some of our original locations being merged, as this was a geolocation problem and those places were right next to each other. This also solved the problem of defining the location of several pictures that featured both landmarks together. Now lets get into the technical aspects, our method and such.

We obtained our model by transfer learning methods. As it’s explained in [1], there are couple of ways to use in transfer learning. We’ve started with fine-tuning, which means we took a convolutional network as an initialization, updated its final layer and backpropagate again to fine-tune the weights according to our classes. For example, ResNet from Microsoft classifies 1000 different categories. But we need it to be 120 because our dataset contains 120 different categories. We changed its final layer with a new layer, which outputs 120 categories and trained again so that it can adjust to this change.

The other method could be obtaining features from ConvNet and then linear classify these features. We might try that and compare two results. And one other way could be using linear classifier on our results, because some images contain more than one categories, for example this image:

Sheraton, Atakule

Our model is supposed give either Sheraton or Atakule as a result for this image. We may linear classify and obtain more generalized result, like Çankaya. But for now, we will be analyzing the fine-tuning results.

We used PyTorch to get ConvNets and its tutorials[2] to train our dataset. Our results:

We also trained ResNet101 with 25 epochs and got the best results with it. It reached 0.8304 training accuracy and 0.81 validation accuracy.

To test our model, created .json file for category labels[3].

Few interesting results from our model:

Correctly predicted, “Başkent Üniversitesi”
Predicts “Tabiat Tarihi Müzesi”, it should’ve been “Başkent Üniversitesi”
Correctly predicted, “Bilkent Center”
Correctly predicted, “Bilkent”
Predicts “Çankaya Köşkü, Pembe Köşk” but it should’ve been “Bilkent”
Predicts correctly, “Bahçelievler 7. Cadde”
Correctly predicted, “İzmir Caddesi”
Correctly predicted, “İzmir Caddesi”

[1]: http://cs231n.github.io/transfer-learning/

[2]: http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

[3]: http://blog.outcome.io/pytorch-quick-start-classifying-an-image/

--

--