Doppelgängers on the Dark Web: A Large-scale Assessment on Phishing Hidden Web Services
Author: Changhoon Yoon (Founder and Chief Researcher of S2W LAB)
This paper was published at The World Wide Web Conference in May 2019.
Download → http://nss.kaist.ac.kr/papers/yoon-www2019.pdf
Anonymous network services on the World Wide Web have emerged as a new web architecture, called the Dark Web. The Dark Web has been notorious for harboring cybercriminals abusing anonymity. At the same time, the Dark Web has been a last resort for people who seek freedom of the press as well as avoid censorship. This anonymous nature allows website operators to conceal their identity and thereby leads users to have difficulties in determining the authenticity of websites. Phishers abuse this perplexing authenticity to lure victims; however, only a little is known about the prevalence of phishing attacks on the Dark Web. We conducted an in-depth measurement study to demystify the prevalent phishing websites on the Dark Web. We analyzed the text content of 28,928 HTTP Tor hidden services hosting 21 million dark webpages and confirmed 901 phishing domains. We also discovered a trend on the Dark Web in which service providers perceive dark web domains as their service brands. This trend exacerbates the risk of phishing for their service users who remember only a partial Tor hidden service address. Our work facilitates a better understanding of the phishing risks on the Dark Web and encourages further research on establishing an authentic and reliable service on the Dark Web.
The Web is the most popular, worldwide, and accessible platform for sharing and disseminating information across the globe. However, there is not only a bright side of the Web, but there is also a dark side. A set of platforms that host websites whose owners and users remain anonymous is now referred to as the Dark Web, whereas the Surface Web hosts regular websites. The Dark Web hosts websites whose formats and appearances are the same as those of the Surface Web. However, the way to access the Dark Web is different from that of the Surface Web. It demands the use of an anonymity network service for Dark Web service providers and their visitors to hide their identities on the Web. The definition of the Dark Web has not officially been established , but it is often referred by the popular press and security community to emphasize illicit activities that abuse anonymity networks . In this paper, we use the term “Dark Web” to refer to the collection of hidden Web services built on anonymous networks. The Dark Web has become a major distribution channel for delivering and advertising malicious content. Silkroad  and Hansa-Market  are well-known Dark Web marketplaces that sell drugs, illegal weapons, and even malware. In addition, researchers have revealed that the Dark Web contained a considerable amount of harmful content [27, 53], and their findings have been confirmed by government investigative agencies  as well. Conversely, the Dark Web offers a last resort for people who want to avoid censorship, to abide freedom of the press, and even to minimize tracking risks for their privacy. For instance, Venezuela experiencing the recent financial turmoil have blocked accessing political and social content on the Web, thus leaving Tor as the only option to access the restricted content . Motivation. Phishing is one of the most effective threats that harvest users’ privacy-sensitive information . There thus exist the previous investigative studies that emphasize the severity of phishing attacks and the prevalence of phishing campaigns on the Surface Web [44, 54, 59, 61, 64]. Thomas et al. assessed the severity of Web phishing campaigns. Their study showed that phishing websites on the Surface Web emulating Gmail, Yahoo, and Hotmail logins had managed to steal 1.4 million credentials . Conversely, the phishing threats on the Dark Web are understudied. Relatively little is known about the prevalence of phishing websites across the Dark Web universe. This trend stems from the absence of the oracle telling whether a given dark website is a phishing site or not. There is sufficient information to obtain the identity of owners operating websites on the Surface Web, which include HTTPS certificates, WHOIS, and DNS records. On the other hand, most dark websites’ owners seek to hide their identities, which naturally makes it improbable for users to distinguish an authentic dark website from its phishing websites on the Dark Web. We argue that it is crucial to address these phishing risks. On the Dark Web, users have no practical way to check the authenticity of Web services except leveraging out-of-band clues, thus exacerbating the severity of phishing threats. To the best of our knowledge, mo previous study has investigated the phishing risks on the Dark Web. Prior work focused on analyzing illegal contents or unexpected activities [26, 42, 43, 46, 51], uncovering illegal activities [24, 25, 29, 34, 58, 63], and analyzing the popularity of the content on the Dark Web [27, 28].
Contributions. We conducted an in-depth analysis to identify phishing websites on the Dark Web. To do this, we collected more than 21 million webpages from 100K Tor hidden services for seven months. Our dataset represents the most up-to-date and comprehensive characteristics of the Dark Web. We start with identifying phishing candidates whose website contents are almost identical to other websites. In other words, we investigate how many domains with distinct content and their duplicates constituted the Dark Web. We employ a carefully designed content grouping algorithm that classifies onion domains into content-wise distinct website groups based on the text and the title in each Tor onion domain’s homepage. We observe that only 5,718 website groups exhibiting distinct content are available on the Tor network. Interestingly, the content of the top two website groups is duplicated in over 200 domains, which calls into question the authenticity of these domains. For each website group consisting of multiple domains, we analyze all of the identified duplicates and confirm the presence of abundant phishing websites among the duplicates. Specifically, we identify 791 phishing websites that target five major Dark Web services including dark marketplaces and Bitcoin mixing services. We further analyzed how often users would encounter such phishing domains by counting cross references from other domains. In general, an authentic dark website is more frequently referenced by other dark domains than its phishing websites. However, interestingly, the most referenced website of Dream Market, a popular black market, is a phishing domain, which demonstrates that the attacker diligently spreads phishing domains. To find further phishing websites on the Dark Web, we leverage “gray website.” Gray website refers to a website that provides their identical services on both of the Surface and Dark Web. By leveraging the same ownership of a gray website and its corresponding surface website, we find 297 phishing websites on the Dark Web that target gray websites including Facebook. Our study is the first large-scale investigative study that confirms prevalent phishing sites on the Dark Web. Biryukov et al.  briefly mentioned the presence of one phishing website that mirrored Silkroad. We also analyze the common trend in onion domain addresses on the Dark Web. Each website on the Tor network is represented by an onion domain name, a cryptographic string computed from the owner’s public key. Considering that even picking an arbitrary five character-long prefix of an onion domain is computationally expensive and the visitors should still type the entire domain, we expect no meaningful prefix on onion names. However, we observe that the majority of the Dark Web service providers intentionally generate them to have meaningful prefixes that are at least five characters long. We conclude that the onion domain name itself is perceived as a brand for the service providers on the Dark Web. This trend has led to dark websites being vulnerable to phishing attacks because victims will depend on a partial memorable prefix instead of its entire onion domains to determine the authenticity of a service. In this paper, we systematically studied unique/duplicate textual contents based on the comprehensive Tor anonymity network dataset, which is larger than any previous research. We observe abundant duplicate websites that target phishing victims and manifest their characteristics. We also confirm close correlations between website content and onion domains, which exacerbate the phishing risks on the Dark Web. Our work facilitates a better understanding of the phishing websites on the Dark Web and invites further research.