An NBC investigation, “Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent”, explores IBM’s use of photographs to train its facial recognition algorithms: the company used photographs taken from Flickr published under Creative Commons licenses to create a database — which it recently made available — and used it to develop its technology.
This is a subject of particular interest to me: I was an early Flickr user and have more than 3,600 photographs stored there, but have not used it for a while, and I also publish all my photos — like most of my professional output — with the least restrictive Creative Commons license model (CC BY or Attribution). Using a tool created by NBC to consult the database IBM has used to train its facial recognition algorithms, I see that the company has taken three images from my collection, some of them at an event in which I appear with friends. I’m sure they had no problem with the photos being published, catalogued or associated with an open license, but they now find their faces, and possibly some other metadata or information such as their names, have been used by a company to develop a controversial technology.
There are a number of aspects to all this: firstly, the legality of using photographs. I am completely used to mine being used for different purposes. I understand how open licenses work and in general I like seeing one of my photographs used in some publication: I would never have imagined that as an amateur photographer my work would appear in media of all kinds, such as Wired. However, there are other issues related to question of whether IBM’s use of my photographs is legal, which is the faces of the people included in them, about which, logically, I have no rights, and nor should I.
Was I mistaken to tag all my photographs as Creative Commons BY and instead have kept a strict copyright over those that contained images of people? Instead of using a blanket license, perhaps each time I upload a photograph to Flickr I should have thought more about the type of license to use. I’m no lawyer, but even accepting that responsibility, does that automatically give IBM the right to use my photographs with the faces of my friends in a database? One could argue that it has exceeded the terms of a license that was designed to regulate the public use of the images, and not for other uses.
IBM says it merely used a 14GB file of one hundred million images that Yahoo!, then the owner of Flickr, published openly on Yahoo! for use by researchers, which could shift discussion about the responsibility of a possible misuse of the license elsewhere. IBM reduced the size of the original database, converting it into a file of approximately one million faces, supplemented by adding about two hundred values ranging from measurements of certain facial dimensions to the type of pose, skin tone, gender or estimated age.
The database has been used to train all kinds of algorithms, including some for police use, as well as its own tool, IBM Watson Visual Recognition, which can estimate people’s age or gender, as well as recognizing specific individuals. Considering the controversy associated with facial recognition technologies, the company should at least have considered the possibility of requesting permission from the authors of the photographs, instead of assuming that a particular license that was not conceived with such uses in mind.
IBM says it has used the database to try to reduce biases in facial recognition and improve the quality of the technology. But the database is there, available to anyone who wants to download it and put it to potentially harmful use, which means that the time has come for greater control to be applied, and express permission requested for its use.
Where does the problem lie? Misguided trust on the part of the authors of the photographs, or misinterpretation of the potential of open licenses? Have companies abused that trust in using the contents for their own ends? Is it my mistake or Yahoo!’s, or IBM’s? Or are we all to blame? What is happening to all these pictures we are constantly uploading all over the place?
Or perhaps there is no problem here at all and it’s just that we’re going to have to get used to anything we upload being used by third parties for any purpose they want?