What to do with them old photos? Part 2

Tord Nilsen

Published in

Tord Nilsen

4 min readApr 29, 2019

Tensorflow is looking at them old photos.

In part 1 I download a lot of old photos from Facebook group and imported them into MongoDB.

Now I will try to get Tensorflow to do some magic.

They say a picture is worth a thousand words, but a picture with context is worth thousand times more.

When I download the photos from the Facebook group, I lost something important: the context. I have the first comment and in more than a half the story is there: -”this is the opening of some-event" , but very often the first comment is a question -”does anyone know where/what this is?”. The benefit of social media is that someone probably knows the answer, and give the photo some context in the comments.

I wanted to see if AI and Machine Learning could give the photos some context

Installation

Installing Tensorflow with GPU support and Python on a gaming computer is not straight forward, Nvidia drivers has a tendency to conflict and I used a whole day to get everything up and running. But looking back it was worth it, a good GPU card makes the AI training a lot faster .

I used PyCharm which is an integrated development environment used in computer programming.

Training for captions

After installing it was time to do some Python coding.

I wrote and partially used examples found on Tensorflow website.

To get the AI to recognize captions I downloaded Coco (Common Objects in Context Dataset) and used 50k images in the training. Classification of each image was done with InceptionV3.

Coco dataset is an object detection dataset with 80 classes, 80,000 training images and 40,000 validation images.
The Inception v3 model is a deep convolutional neural network, which has been pre-trained for the ImageNet Large Visual Recognition Challenge

Result

Training an AI to caption images with attention is maybe the most difficult task. Especially when the training dataset and the photos to analysed is 30 years and more apart.

Some examples:

RESULT: A group of people is standing on a bridge

RESULT: A group of people on a skateboard on a cement park

When I look at the photos and see the result I can understand that an AI thinks what is does. To defend the AI, in the first example there is a bridge. A stone bridge :)

The second photo is a good example of generation gaps. In 2019 there would probably be a skateboard there.

But…

Is captioning what I really wanted? Does a caption with attention tell the story behind a photo?

No. Attention is saying something was is going on in a photo, real-time. Not the history behind. And it is the story behind the photo I wanted to be stored digitally for future generations.

Up next

In Part 3 I will do a simple photo analyse with Tensorflow, store hashtags in a MongoDB and implement Elastic Search. And to get user-involvement I will implement some Gameification and a Generous Interface. And I downloaded a lot of photos from Riksantikvarieämbetet and Norvegiana to get more accurate hashtags…