What to do with all them old photos?

Published in

tordnilsen

6 min readApr 3, 2019

Every city or community have it; a facebook-group where you can share old photos. People share to either information, to ask, or just to get the rewarding thumbs up.

I find it enjoyable to look at the old photos and learn something about our common history.

But as a computer geek and software developer and a more than average cultural heritage interested person I have asked myself:

-what happens to the digital version of photos?

Do they disappear if (or when) Facebook does not exists anymore?
Do they have any value as register/database existing only on FB?
Are they worthy as a historical item for museums?

Collections like these is a nightmare for database developers like me. There is no consistency, no rules. Photos does not have a hashtag, no exif data, they might have comments and the resolution is variable.

To fit something so very not-consistent into something searchable and indexed might be very difficult. Or ??…

Scraping data

I started by scraped the group with a script, giving me image URL and the first comment . There is an option in Facebook where you can download all photos, but you have to be administrator to do so.

I saved this data to a text file and with a new script I started downloading each photo. I ended up with a little over 23.000 images.

I saved the photos in a folder and the textfile contained filename and comment. Using filesystem to store photos and a database to store filename/location is an easy solution, but can be a maintenance issue if you need to change the location of the file.

Importing data into database

The next step was to get captions and filename into some sort of database. I have been developing databases in MS-SQL for many years and masters that technology to my fingertips.

So I ended up with MongoDB. Why? To become a more versatile developer.

MongoDB is an Open Source, NoSQL database management system which leverages a JSON-style storage format known as binary JSON, or BSON, to achieve high throughput.

Importing the textfile to MongoDB is straight forward: Convert whatever file you have into JSON and import it.. I used NoSQLBooster, a GUI Admin tool for the import and for a early look on the data.

23k records can sound like a lot, but as you can see, it only took 2.3MB storage. As a former database developer in financial sector I am used to databases where I have to multiply that number by 100 to get something that looks like a database.

Data needs to be searchable

The whole purpose of this project is to make data searchable. And because data is based on comments it need to be fulltextseach.

MongoDB is great for many tasks, except for the obvious: : it is a database and not a searchengine.

There is a couple of alternatives:

Sphinx is good for structured data (predefined text fields and non-text attributes), but it is not the best choice for projects that deal with unstructured data.
Solr is not as quick as ES and works best for static data (that does not require frequent changing). Solr also have Machine Learing.
ElasticSearch is currently #1 searchengine, but it is still a young technology. Not all desired features come out of the box.

Solr or ES stands out as the two best alternatives. I know them both very well, and for this project they would both fit my needs. However my laptop already had ES installed so that was what I chose.

Elasticsearch is an open source, distributed, RESTful, full-text search engine built on top of Apache Lucene. Elasticsearch is developed in JAVA. It uses schema free JSON documents and comes with extensive REST APIs for storing and searching the data.

ElasticSearch and MongoDB does not talk to each other out of the box. You need some kind of connection. As always there is many options, Mongo-connector, a real-time sync service as a package of Python, is the obvious choice. However, I wanted to explore new technologies, so I went for Monstache, a real-time service as a Go deamon.

Installing Monstache is similar to Mongo-connector; create a MongoDB replication and install as a service. They both is tailing MongoDB’s oplog

Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster.

Visualization photos

I installed Cantaloupe Image Server. I chose Cantaloupe because of its compliance with the IIIF Image API

The IIIF API specifies a web service that returns an image in response to a standard HTTP or HTTPS request. The URL can specify the region, size, rotation, quality, and format of the requested image. A URL can also be constructed to request basic technical information about the image to support client applications.

The IIIF API was designed to facilitate systematic reuse of image resources in digital image repositories maintained by cultural heritage organizations.

Cantaloupe is an open-source dynamic image server for on-demand generation of derivatives of high-resolution source images in Java.

I quickly made a PWA, Progressive Web Application based on Ionic. (because that is what Fullstack Developers do.) that could browse images from the IIIF API. Nothing fancy, just Proof of Concept.

Using MongoDB, ElasticSearch and IIIF might look as a overkill for this project. And it is. The project is an example of what should, and could be done with large collections of photos.

Back to my questions, do the photos disappear when FB disappears? Yes. Do they have any value outside of FB? Yes, this project shows that it is possible to build a photo-collection outside of FB’s walls. Are they of any value as cultural heritage? Yes, yes and yes. When I look at the comments of this photos there is so much information, so much history and knowledge. That will be gone if not the cultural heritage institutions starts collecting them.

Up next

Next step will be where I unleash the power of artificial intelligence on the data. I’m gonna let TensorFlow neutral network analyse photos and comments. Tensorflow will help me fill in the blank comments I have in the database. And also help me hashtag the photos.

Step 3 is to use my AI curated photocollection to create an application aimed for tourism with a Generous Interface. All neatly tucked into a Progressive Web Application based on Ionic Framework.

I will use Firebase as hosting for the PWA so I get notifications and a realtime database ‘for free’ so I might implement some sort of social interactions and commenting in step 4.

Disclaimer! Step 5 will be to delete the whole project.