The day I fed my friends to an IBM algorithm

Enrique Dans
Mar 19, 2019 · 4 min read

An NBC investigation, “Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent”, explores IBM’s use of photographs to train its facial recognition algorithms: the company used photographs taken from Flickr published under Creative Commons licenses to create a database — which it recently made available — and used it to develop its technology.

This is a subject of particular interest to me: I was an early Flickr user and have more than 3,600 photographs stored there, but have not used it for a while, and I also publish all my photos — like most of my professional output — with the least restrictive Creative Commons license model (CC BY or Attribution). Using a tool created by NBC to consult the database IBM has used to train its facial recognition algorithms, I see that the company has taken three images from my collection, some of them at an event in which I appear with friends. I’m sure they had no problem with the photos being published, catalogued or associated with an open license, but they now find their faces, and possibly some other metadata or information such as their names, have been used by a company to develop a controversial technology.

There are a number of aspects to all this: firstly, the legality of using photographs. I am completely used to mine being used for different purposes. I understand how open licenses work and in general I like seeing one of my photographs used in some publication: I would never have imagined that as an amateur photographer my work would appear in media of all kinds, such as Wired. However, there are other issues related to question of whether IBM’s use of my photographs is legal, which is the faces of the people included in them, about which, logically, I have no rights, and nor should I.

Was I mistaken to tag all my photographs as Creative Commons BY and instead have kept a strict copyright over those that contained images of people? Instead of using a blanket license, perhaps each time I upload a photograph to Flickr I should have thought more about the type of license to use. I’m no lawyer, but even accepting that responsibility, does that automatically give IBM the right to use my photographs with the faces of my friends in a database? One could argue that it has exceeded the terms of a license that was designed to regulate the public use of the images, and not for other uses.

IBM says it merely used a 14GB file of one hundred million images that Yahoo!, then the owner of Flickr, published openly on Yahoo! for use by researchers, which could shift discussion about the responsibility of a possible misuse of the license elsewhere. IBM reduced the size of the original database, converting it into a file of approximately one million faces, supplemented by adding about two hundred values ​​ranging from measurements of certain facial dimensions to the type of pose, skin tone, gender or estimated age.

The database has been used to train all kinds of algorithms, including some for police use, as well as its own tool, IBM Watson Visual Recognition, which can estimate people’s age or gender, as well as recognizing specific individuals. Considering the controversy associated with facial recognition technologies, the company should at least have considered the possibility of requesting permission from the authors of the photographs, instead of assuming that a particular license that was not conceived with such uses in mind.

IBM says it has used the database to try to reduce biases in facial recognition and improve the quality of the technology. But the database is there, available to anyone who wants to download it and put it to potentially harmful use, which means that the time has come for greater control to be applied, and express permission requested for its use.

Where does the problem lie? Misguided trust on the part of the authors of the photographs, or misinterpretation of the potential of open licenses? Have companies abused that trust in using the contents for their own ends? Is it my mistake or Yahoo!’s, or IBM’s? Or are we all to blame? What is happening to all these pictures we are constantly uploading all over the place?

Or perhaps there is no problem here at all and it’s just that we’re going to have to get used to anything we upload being used by third parties for any purpose they want?

This article was previously published on Forbes.

(En español, aquí)

Enrique Dans

Enrique Dans

Written by

Professor of Innovation at IE Business School, blogger at enriquedans.com and Senior Contributor at Forbes

Enrique Dans

On the effects of technology innovation on people, companies and society (writing in Spanish at enriquedans.com since 2003)

Enrique Dans

Written by

Professor of Innovation at IE Business School, blogger at enriquedans.com and Senior Contributor at Forbes

Enrique Dans

On the effects of technology innovation on people, companies and society (writing in Spanish at enriquedans.com since 2003)

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store