Mapping Facial Recognition in Online Media

by Abdul Sattar, Joachim Mensa-Annan, Mathias Sloth, Mikail Aydilek, Sebastian Jensen

Introduction

Biometrics is becoming an ever more integrated part of society. DNA databases holds our genome, fingerprints gets filed in government registers and our faces are archived on different social media platforms, amongst other examples. Facial recognition systems is a technology that can identify a person based on their facial features. At airports, the faces of people might get scanned and archived, some larger streets are under surveillance by cameras that can spot a certain face in a crowd and our smartphones has the ability to unlock by scanning our face. This technology is imposing some new challenges and questions as to what privacy is and how the technology is going to be used (Brad Smith, 2016; Dawn Kawamoto, 2018). The industry is moving forward while scepticism surrounding the usefulness of the technology and social equity becomes apparent (Constantine von Hoffman, 2018 ; Christina Couch, 2017).

Drawing on tools found in the field of Digital Methods, we have explored news articles from the database LexisNexis in the period 2007–2017, to map the media on the topic of facial recognition and how this evolves. We are doing this to find possible controversies, and find out what people are talking about regarding facial recognition technology, and how they are talking about it. To do this, we are using controversy mapping, which aims to untangle a controversial subject. By mapping different aspects of facial recognition we can open up for the controversy of data gathering and usage and surrounding issues.

In our previous work we explored the topic of Biometrics on Wikipedia and found that issues such as privacy, bias etc. was spread thinly among different topics, and the technology of facial recognition peaked our interest as this was very prevalent in privacy. In this blog post we will explore the different media’s topics of interest in relation to facial recognition. We want to explore what the media world is interested in and how it might have developed over time.

Protocol

For the kind of data we wanted to harvest we chose the database LexisNexis. The database has a large amount of articles related to business, law and marketing. We were interested in news articles so we used the database option “Major World Publications” for selecting a manageable amount of news sources from english publishers such as The New York Times, The Guardian, The Washington Post etc.

We extracted two types of data: the full articles (used for keyword search and a semantic analysis), and an extraction of the metadata from the database (used examining the subjects of articles and how certain subjects develop over time). The digital platform CorText was used to produce a semantic network of all the data, which was then visualized in Gephi, and “heat maps” of each year (further explanation of the map below).

This is a protocol of every step made throughout the process of this controversy mapping, including where data was gathered, how it was processed and for what purposes, such as visualizations.

Semantic Network Analysis

Network made with CorText showing co-occurrence between terms in the articles. The nodes represent terms, while edges link the terms which co-occur. We picked the top 500 most occurring words and manually removed terms which had little meaning (e.g. english, newspaper) in relation to our topic and caused noise in our network. The end result was 346 terms.

Our co-occurrence network, is based on the top 500 terms from our corpus which is composed of articles harvested from LexisNexis. We harvested articles from 2007 to 2017, where our search words were “Face/Facial recognition”. There are five clusters, two of which are bigger than the others. The first cluster we want to examine is the yellow cluster (Security and Law Enforcement) at the top of the network. This cluster is composed of terms related to CCTV, law enforcement, terror, surveillance, state, security and political figures. The cluster is close to two smaller clusters, red named ‘Privacy and Tech Industry’ and dark blue named ‘Recognition and Identification’. The red cluster has terms closely related to ‘privacy’ and concerns around this topic. The dark blue cluster has terms closely related to ‘biometrics’, ‘recognition and identification technology’. This shows us that these three themes occur together, where the most notable nodes that connect all three cluster together are: ‘Privacy Advocates’, ‘Privacy Commissioner’, ‘Government Agency’, ‘Federal Government’, ‘Home Affairs’, ‘Biometric Data’ and ‘ID Cards’. For example, ‘face recognition’ and ‘terror’ are co-occurring because facial recognition technology can potentially be used to identify suspected terrorist in airports and public spaces, or be misused for the same purpose.

On the bottom of our co-occurrence network we have two cluster close to each other. One is light blue and the other is green. The light blue cluster has terms closely related to ‘Computer Software’, ‘Hardware and Mobile Phones’. The green cluster has terms closely related to ‘Apple’, such as the iPhone and other technology developed by the company. There is a big gap between the two bottom cluster and the three clusters in the top. The light blue and green cluster is, however, connected through the red cluster and dark blue cluster at several small nodes, most notably terms like: ‘Voice Recognition’, ‘Fingerprint Reader’, ‘Virtual Reality’, ‘Eric Schmidt’ (ed. Executive Chairman in Google 2001–2015), and ‘Mobile App’.

Heat Maps of Semantic Network

These are CorText heat maps showing which terms are present each year. The red hue indicates what terms are used that year. The datasets had to be split up for practicality so some years with many articles had more maps (e.g. 2017 1, 2017 2 etc.)

We see that the activity moves from the yellow cluster, Security and Law Enforcement, to the green cluster iPhone and Apps. Many of the technologies in the green cluster did not exist in 2007 so the development of these technologies influences what the media writes about facial recognition. These new technologies seem to catch the attention of the media.
The red cluster Privacy and Tech Industry, becomes active around 2011 onwards. In 2013 we see that the yellow and red cluster becomes active around ‘Edward Snowden’ and ‘Privacy’, while new technology, like self driving cars and Google Glass, also becomes increasingly discussed.
Two nodes are very prominent and relates to a scandal from Australia where paparazzis claimed that they had found nude photos of a senator in her youth using facial recognition. (https://www.news.com.au/news/raunchy-images-not-of-pauline/news-story/e84717357df0d0c86279061a9be06258)

The heat maps from the different years show that the media from the LexisNexis database is highly focused on new technologies, scandals and issues related to face/facial recognition. Since the database contained tabloid magazines and business newspapers, for example The Daily Mail, The Sun, Marketing Week and Financial Advisor, our map has many nodes related to topical events, which the media find interesting. Privacy issues became interesting for the media and business world with actors such as Edward Snowden and The EU Commision in the 2010’s, when Snowden exposed the american National Security Agency spying on the public (Glenn Greenwald, Ewen MacAskill and Laura Poitras, 2013) and EU began working on legislation such as the GDPR (European Data Protection Supervisor, 2019). This highly interests the business media as legislation has an impact on business (Greg Shepard, 2018).

The case of the Australian senator, Pauline Hanson, shows exactly why advocates against facial recognition see personal privacy at stake when it comes to the future of this technology. If data can be used as slander or blackmail against a person or organizational entity, it could result in the public having distrust in companies or their government that collect the data.

Timelines on keywords and selected article subjects

Bar chart timeline showing the approximate amount of times specific keywords have been mentioned each year, from 2007 to 2017, in the articles harvested from LexisNexis. The data has not been adjusted for differences in amount.

While all keywords are getting more mentions because of the amount of articles changing (433 in 2007, 766 in 2012, 2419 in 2017 etc.) we can still see spikes in mentions. The keyword ‘privacy’ is mentioned frequently throughout the span of 10 years, whereas keyword ‘discrimination’ is noticeably less frequent. This shows us that ‘privacy’ is more common in articles relating to face/facial recognition, than ‘discrimination’. But in 2012 and 2014 discrimination becomes more mentioned in our dataset. While this could be a small outlier as the chart shows between 0–15 mentions for ‘discrimination’, the year 2016 and 2017 correlates with the other keywords. Looking through the articles mentioning ‘discrimination’ for the year 2012 and 2014 we find articles about targeted advertising based on facial biometrics and new airport security measures. This shows that facial recognition and potential negative aspects first enters the media’s attention in a prominent way, when the technology is already available and close to the public.

Streamgraph showing the metadata of subject from articles from 2007–2017. Size has not been adjusted to counter the progressively larger amount of articles

For this timeline, we had chosen a few subjects from our data, that we thought were interesting, in conjunction with our search terms; ‘face/facial recognition’. In a span of 10 years we see these subjects becoming more prevalent in this relation. An example is ‘Cybercrime’ and ‘Terrorism’: With facial recognition technologies becoming more ubiquitous in everyday lives, the occurrence of these keywords could represent a general worry that the technologies could be misused for cybercrime (such as identity theft), but also that it could be used as a measurement against terrorism using the recognition features to apprehend suspected terrorists.

‘Privacy Rights’ is also interesting in that the topic becomes larger around 2011–2013. This could be in relation to the expansion of personal devices such as smartphones and cases of large data collection on the public such as the NSA and the Edward Snowden case.

It is important to note, that the increase of these subject occurrences over the 10 year span could be due to LexisNexis having expanded their database over time, and thereby naturally increasing the amount of articles surrounding these subjects.

Postscript

Through our data we have found out that security and law enforcement is a major theme in facial recognition. This is consistent with our data from our previous investigation of biometrics on Wikipedia (https://medium.com/mapping-controversies-biometrics/mapping-controversies-biometrics-2c0dbeba2437). A key difference is that we investigated media sources this time, which is different from the encyclopedia format of Wikipedia. With the LexisNexis database we harvested articles leaning towards the business and tabloid magazines world. The semantic map and the belonging heat maps showed that new technology and specific products (e.g. smartphones and google glasses) were a main focus of the media. Privacy became a popular theme in the 2010’s around the time of the Edward Snowden case and discussion about new frameworks surrounding data protection. An interesting thing to note is that American presidents were found in the security and law enforcement part of the cluster while the EU was in the cluster about privacy. This suggests that we have two governmental entities that differs in their relation to facial recognition. Facial recognition systems’ ability to identify criminals, terrorist etc. is in our network connected to the american government while the issue of citizens’ privacy is connected to the European Union.

The media begins to talk more about discrimination in relation to facial recognition when the technology becomes closer to the user and products begin to enter the public sphere. This could be because the technology and its potential for abuse, misuse etc. first become obvious when the technology enters the public sphere. The blog post from the Microsoft employee, Brad Smith, suggests that legislation on the area of facial recognition is failing behind the technological development. From our data we can see that it is not only the legislation but also the public and media discourse on the subject that are behind facial recognitions multiple usages and emerging artifacts. There were approximately 2000 more articles in the year 2017, than there were in 2007. This might stem from the fact that new technology in facial recognition has gotten the media’s attention increasingly over time.

--

--