When machines understand what they see

Take a look at the picture accompanying this entry: it’s a recent photo on my Facebook page showing a bridge over a river at night, reflected in the water, with a few lights celebrating the New Year.

Nothing unusual about that, you might say. Now look at the caption, where a Facebook artificial intelligence algorithm has scanned the photograph, interpreted it, compared it with a huge library of images used to educate it, and concluded that contents of the photo are: bridge, night, sky, outdoor and water. A spot-on labeling of the photograph, carried out completely automatically, without human intervention. Applications like Google Photos or Apple Photos also use algorithms to help people organize their files, and are now being applied to photos uploaded to the social network.

It’s pretty easy to obtain the Facebook automatic photo tags: you just locate the photo, enlarge it to full screen, right click to get the context menu, select inspect (in Chrome) or inspect element (in Firefox) and look at the Alt properties associated with the image. If you prefer a solution without code, you can install an extension in your browser that visualizes the visual elements as an additional layer on the photographs themselves. Since April 2016, Facebook has added these tags to images automatically after a 10-month development project to help the blind or visually impaired people.

Any user of photographic repositories, surely beginning — or at least for me — with Flickr in 2004, will remember the requirements once required for labeling photographs: choose appropriate labels, maintain consistency, use singular or plural, etc., so that these tags could access content more simply. Now, an algorithm can see the photo, deduce its content with remarkable accuracy, and assign the corresponding labels with no human intervention.

That a machine is able to first understand an image as a simple set of pixels to interpreting its content and labeling it accurately may not be that new: a lot of time has been spent on this, producing remarkable progress, but applying that technology to your own photographs certainly helps us to better understand its potential.

Some naive souls may still think that artificial intelligence is just about carrying out simple exercises, but this kind of technology shows that the reality is very different.

We are not talking about a machine capable of simply scanning a photograph and comparing it with others in a database, but of a machine that is able to see a bridge and be able to understand that it is a bridge, regardless of which bridge it is or from which angle the photograph is taken. This is technology able to compare an image of a bridge with all photos tagged as bridges used in its education and deduce that this new image is also a bridge … which is basically what humans do, if they too have been properly educated.

(En español, aquí)