Illuminating the Dark Web
Three years ago, Harvard student Eldo Kim emailed a false bomb threat to campus administration, seeking to avoid taking final exams. In order to evade detection, Eldo sent the email through a series of nominally anonymous services: the email originated from a self-deleting email address provided by the service Guerrilla Mail, which he accessed via the well-known anonymity software Tor. Yet within two days Eldo was apprehended by the FBI. His fatal flaw: accessing Tor through Harvard’s wireless network. When the investigators checked who on the network had been using Tor when the threat was sent, a suspect emerged, and a confession soon followed.
Situations like this show that even the most thorough privacy and anonymization protocols are not sufficient in ensuring anonymity; careless usage of technology is an ever-present vulnerability. Our study sought to analyze the prevalence of a similar privacy oversight in one of the most anonymity-valuing realms on the Internet: dark net markets. Is it possible that the users of such sites are unwittingly revealing their location as they post images to the markets?
The dark web consists of Internet content hosted on dark nets: overlay networks that can only be accessed with special anonymity software such as Tor. Though the terms are often and erroneously used interchangeably, the dark net is distinct from the deep web, which is the collection of content on the Web that is not indexed by search engines (e.g., password-protected websites, or dynamic websites).
In the public view, the dark web — as its name may suggest — is shrouded in mystery, and often seen as an anonymous haven for criminality. This use of the dark web was noted by former British Prime Minister David Cameron at a summit against child pornography in 2014, where he stated that “the dark net is … [not] the normal parts of the Internet that we all use.”
“the dark net is … [not] the normal parts of the Internet that we all use.” -Prime Minister David Cameron
Recent work by Dr. Gareth Owen and Dr. Nick Savage at the University of Portsmouth to quantify and categorize the usage of the dark web determined that about one third of the dark web may be categorized as dealing with Drugs, Markets, and Fraud, three activities that depend on the anonymity provided by the dark web when skirting around the law.
These activities might be best exemplified by the infamous Silk Road 1 marketplace, the “eBay of illegal drugs” that, when shut down by the FBI in 2013, comprised “more than 70% of the online drug market”.
Dark web sites such as Silk Road 1 rely on the anonymous routing provided by Tor, a specialized Internet browser and software suite which was described by the NSA in a leaked top-secret document as “the king of high-secure, low-latency anonymity.” Furthermore, Tor was acknowledged by the British Parliamentary Office of Science and Technology in March 2015 to be “by far the most popular anonymous internet communication system,” having an estimated 2.5 million daily users.
Our goal was to leverage a longitudinal archive of dark net markets (DNMs) to collect and analyze sale listing images with metadata containing location data. When a digital image is taken, there is additional information, called EXIF data, attached to the image itself. This may include date of capture, make and model of the camera, and other information. Smartphones and many modern cameras can use GPS technology to attach location information to pictures as they are taken (a “geotagged” image). In our investigation, we searched for the presence of these geotags in the images of items for sale on dark net market sites. We also aimed to determine the presence of any notable groupings and patterns in the locations found, both inter- and intra-markets.
The data used for this study was the Black-Market Archives, an collection of DNM web scrapes created by an independent researcher and writer who goes by the handle Gwern. After the widely-publicized takedown of Silk Road 1 by the FBI in December 2013, Gwern began regularly scraping market sites and associated forums in order to create a more comprehensive picture of the constantly shifting DNM landscape. The freely-available archive encompasses 83 different DNMs and 40 associated forums from 2013 to 2015 (exact dates vary from site to site). This amounts to over 44 million files, taking up over 1.5TB of space.
In order to analyze the listing images inside each archive, we first searched for and compiled a list of the file path of all JPEG images to ensure that no file went untested. (Images used for listings were only in the JPEG format; any other image formats — PNG, GIF, etc. — were used for website graphics.) Then, using Python and bash scripts, we checked each image’s EXIF data for longitude or latitude data, saving the coordinates for each geotagged photo and its file path to a text file.
The coordinates were then extracted and analyzed to determine the number of unique coordinates — very commonly, a listing and its associated image would be hosted on a DNM for several days, resulting in the same image, and thus the same coordinates, appearing in our results multiple times.
Out of these markets and forums, we located 2,276 total geotagged images, which after eliminating duplicates available over multiple days, gave 229 total unique images with associated coordinates. The coordinates—with decimals removed from the numbers to protect privacy—can be seen plotted in the map below. (The coordinates may be up to about one mile away from their true location.)
In total, we analyzed 7,522,284 images from the entire DNM archive, representing 223,471* unique photos. Table 1 presents a summary of markets containing geotagged images:
Table 1. Markets and Geotagged Images
*Our code counted unique images by comparing filename hashes — Evolution did not hash filenames, rather it named every image “large.jpg”, “small.jpg.”, etc. Thus we do not have a count for total unique images in this market. The 42 uniquely geotagged images, then, are differentiated based on unique sets of coordinates.
The total number of geotagged images found was far greater than the number of unique images. This is, in a way, a measure of the longevity of online listings — the longer a listing was hosted on the website, the greater the probability that it was re-scraped. While some images appeared only once in the archive, the particular case of the dark net market Area 51 presented several images which appeared 107 times, representing a time span of 215 days.
Agora presents an interesting case: from the archive’s beginning on January 1, 2014 through March 18 of that year, we identified 963 non-unique tagged images — the most of any market in the study. However, after March 18 we found zero images with location data through the end of the archive. The sudden cut off from what was previously a plentiful source of images suggests that the site’s administrators may have discovered this geotag vulnerability and instituted some form of metadata removal after this date.
Of course, EXIF data can not only be stripped but outright modified to point to an entirely separate location. However, patterns in our findings point to the results’ validity, rather than discounting them based on this factor. First, it was common in many cases to observe sites, typically residential, surrounded by 5–10 tagged images separated by a few meters. This suggests the behavior of sellers who are careless on a regular basis, rather than the occasional forgetfulness of not stripping data or purposeful manipulation. We also found several instances of these clusters incorporating listings on multiple sites, pointing to sellers with activities across the dark net and failing to strip their products’ location on any of the sites used.
The dark net markets exist to facilitate the transaction of illicit or regulated goods, such as drugs and firearms. As such, the importance of preserving anonymity of both the seller and buyer is paramount. The existence of the geotagged photos online thus represent a notable failure on two fronts: (1) a failure by the seller and (2) by the website.
Failure by the seller
The geotagged photos typically consist of the good that is being sold; as such, it seems a reasonable assumption to say that the picture was being taken by the seller. While all photos have EXIF data, modern cameras — especially those on smartphones that have GPS — also have the capability of adding coordinates of where the picture was taken to the EXIF data. Given the popularity and accessibility of smartphones, it is not unlikely that many of these photos were taken with smartphones, and a subset of those photos contained geotag metadata. When uploading these photos to a listing, if the seller does not remove the geotag data before uploading, they are risking the publication of that localizing data, essentially blindly handing the management of this sensitive information to a third-party website.
Failure by the website
Preventing a single instance of metadata exposure requires one of two things: (1) either every user of the site must be technologically capable and aware of the necessity to manually strip each picture’s metadata before upload, or (2) the site itself can simply strip or otherwise hide identifiable EXIF data of all uploaded images. This approach can be seen, for example, with Facebook: they ensure that no images have any user-accessible EXIF data. Though it may still be stored internally on Facebook servers, Facebook stresses that “we don’t display location EXIF data in the version of your photo that you share with others.”
Thus, given the importance for dark net markets of maintaining a reputation for legitimacy and safety for both sellers and buyers, and considering the difficulty of ensuring best practices by all users, the markets’ failure to strip or hide EXIF data presents a significant security oversight. Indeed, it is surprising that the markets above did not have automated metadata stripping or concealment protocols, especially as the sites which yielded the majority of geotagged images: Agora, BlackBank, and Silk Road 2 were all highly popular marketplaces until their respective closings.
Further possibilities for research arising from this study include an analysis of the manufacturer and models of the cameras used to take the geotagged photos. (Do these sellers prefer iOS, Android, or non-smartphone cameras?)
Given the accuracy of coordinates, a study analyzing the viability of a policing tool identifying geotags in illicit listing may be productive. However, given the low overall percentage of geotagged images found in the complete archives, we believe that regularly analyzing the EXIF data of DNM listings is not an efficient means of policing sellers of illicit materials.
We are indebted to Gwern for his tireless work and dedication in creating the archive that made this work possible.
- Gwern Branwen, Nicolas Christin, David Décary-Hétu, Rasmus Munksgaard Andersen, StExo, El Presidente, Anonymous, Daryl Lau, Sohhlz, Delyan Kratunov, Vince Cakic, Van Buskirk, & Whom. “Dark Net Market archives, 2011–2015”, 12 July 2015. Web. 12 September 2016. www.gwern.net/Black-market%20archives
A big thank you to Professor Jim Waldo for his support and advice with this research, from its nascent stages as the final project for his course CS 105: Privacy and Technology to its completion many months later.
About the authors
Michael Rose is a Harvard senior studying government and computer science with a special focus on privacy and technology policy. His past experience includes work with the U.S. State Department, research with Harvard’s Berkman Klein Center and a recent paper published in Technology Science. He is a longtime choral singer and photography enthusiast — come find him online @michael_c_rose and at michaelrose.co!
Paul Lisker is a computer science and government student at Harvard College with an interest for the growing intersection of technology, privacy, and government. His research as a Technology and Data Governance fellow at the Federal Trade Commission was recently published in Technology Science. A proud dual Mexican and US citizen, he is passionate about music and is a soccer aficionado. Tweet hello @paullisker and come visit lisker.me!