Dark Web Monitoring (Part 1)

Abdelkader BEN ALI
9 min readAug 11, 2020

--

Over the years, monitoring the Dark Web has presented a serious challenge for Threat Intelligence teams due to the sheer size of the Dark Web ecosystem.

In Dark Web, there exist variety of websites, forums, and IRCs in different web page structures, languages, categories ranging from drug and gun brands to pornography to terrorism to cybercrimes, and content formats (PDF, exe, txt, etc).

As mentioned in my previous article, it is easy to monitor paste websites like Pastebin because we know the structure of websites, what type of data is pasted, etc. But that’s not the case when it comes to the Dark Web.

Note!! There exists also paste sites on the dark web that offer high anonymity through Tor or I2P.

The purpose of this article is to help you understand the Dark Web ecosystem, how collection and analysis can be done when dealing with websites and forums (public and private).

My first article was on how to monitor paste websites for information that can be used to commit fraud against organizations.

Dark Web Monitoring (Part 2) covers Dark Web Forums Monitoring in depth.

Dark Web

Dark Web also called Dark Net, is a subset of the Deep Web, not indexed by search engines and requires specialized anonymity systems to access it, such as The Onion Router (TOR) and The Invisible Internet (I2P).

The Dark Web offers anonymous content hosting that encourages cyber criminals to share information, and therefore it can be a valuable area for identifying current and future threats.

Dark Web Ecosystem

Dark Web Websites

Tor sites whose URLs ends with .onion and I2P sites whose URLs ends with .i2p are considered to be the dark Web.

There are a wide variety of Tor and I2P sites, including many online marketplaces where illegal content is available for sale.

Among fake identities, drugs and contract killers, an increasing amount of digital data is being marketed and sold on these websites, including malwares, exploit kits, confidential documents, credentials, credit card numbers, banking information, and complete personal identity kits.

Black Market Activities Marketplace

Dark Web Forums

Dark Web Forums are sophisticated marketplaces frequented by hackers and cybercriminals who discuss and share information about vulnerabilities, exploits, and businesses.

Any mention of your business name, domain name, emails or IP addresses in these forums may indicate that you are being targeted or that you have already been hacked.

Dark Web forums often require logins and CAPTCHAs, making it impossible to access them without a human assistant. Some criminal forums are only accessible by invitation.

Dark Web IRC

IRCs are anonymous chat rooms where users engage in a variety of topics. It can be used for legal and illegal purposes.
In the case of the Dark Web, IRCs are primarily used to discuss hacks, dumps, and vulnerabilities.

Dark Web Social Networking Sites

In the Dark Web, you can find social networking sites similar to Facebook and Twitter. These sites are used by people who want to share their information without being tracked.

For example, Secure Drop is a .onion website that protects the privacy of whistleblowers and journalists around the world.

Dark Web Search Engines

Similar but not effective to Google, there are many search engines in the Dark Web for example:

  • Ahmia Search Engine
  • Not Evil Search Engine
  • DuckDuckGo Search Engine

Dark Web searchable DBs

If you want to research your leaked passwords, take a look at PWNDB.

PWNDB link: http://pwndb2am4tzkvold.onion

To access it from Google Chrome: http://pwndb2am4tzkvold.onion.ws

Dark Web Email Services

In Dark Web, there are email providers like Outlook and Gmail.

The most used providers are ProtonMail and SecMail.

Dark Web Language Distribution

According to a study done by Trend Micro in 2015 on Deep Web, English was the most used language followed by Russianthen French.

Most popular languages based on the number of domains

Trend Micro White paper: Below the Surface: Exploring the Deep Web is available here.

Identifying Dark Web Links

Many People wonders how to get dark web links. Take a look at the examples below:

Access the Dark Web from Standard Browsers

For those who want to access the Dark Web without any software like Tor, I2P and Freenet, it is possible.

You can use Onion.ws which is a darknet gateway.

Replace .onion with .onion.ws in the dark web link.

Accessing Dark Web from Google Chrome

Dark Web Monitoring Strategies

Different collection and analysis strategies can be applied when dealing with websites, forums, and IRCs.

Dark Web Websites Collection

When it comes to Dark Web websites, we are interested in the collection of complete website content including:

  • Full website content(in text format)
  • Links
  • Media files
  • PDF, word, excel, etc

Note!! Keeping track of a website is difficult, some websites are UP in the morning and DOWN in the afternoon. So you need to find a way to track website uptime.

Collecting media files such as videos are very large may cause timeout, as an action you can skip them.

Dark Web sites can be public or private.

Public websites do not require membership, so you can automatically collect their full content.

Private websites require membership, so a human assisant process is required.

If you want to collect data from a private website, you need to identify the structure of the login webpage in order to develop a crawler that automatically authenticates and collects full content.

Note!! To avoid crawling websites that require membership, you can configure your crawlers to ignore websites where login terms exist: “sign in”, “sign up”, “login”, etc.

Collecting Website Updates

The content of dark websites does not update frequently, so you can collect data periodically, for example every 15/30 days.

Dark Web Forums Collection

Dark Web Forum’s collection techniques differ from website collection techniques.

In forums, we are interrested in infromation such as:

  • users
  • discussions
  • subjects
  • exchanged attachments
  • discussion time stamp, etc

Similar to websites, Dark Web Forums can be public or private.

Public Forums

In the public forums, you can read discussions without needing to authenticate.

Onion Land Forum

Private Forums

In private forums, you can only read discussions when you have successfully authenticated.

DNM Avengers Forum

The problem with the private forum is not the login form but the use of dynamic CAPTCHAs making it impossible to access them without your assistance.

How to automatically connect to Dark Web private forums (without CAPTCHA)

Let’s take the example of DNM Avengers Forum.

DNM Avengers Login form

As indicated in the screenshot above, 2 parameters are required: username and password.

The first step is to identify how these parametre are posted.

  • Inspect the webpage
  • Put any values in Username and Password form (in my case aaa as username and bbb as password).
  • Click on Network
  • Finally, click on Login and view the post request.

To view request body, click on the post request then Edit and Resend as shown below.

We can now develop a crawler that automatically authenticates itself on this forum.

How to automatically connect to Dark Web private forums (with CAPTCHA)

Take the example of The Stock Insiders which is the oldest and largest insider trading forum.

I have exceeded the maximum allowed number of connection attempts in order to request CAPTCHA.

We must first identify the CAPTCHA link in order to extract CAPTCHA image.

Captcha link is under these tags <dd class=”captcha captcha-image”><img src=”link”></dd>

After identifying the location of the CAPTCHA link, we can automate the CAPTCHA image extraction process to resolve it automatically.

As can be seen, there is a lot of noise in the CAPTCHA image in order to make code identification difficult. So once extracted, you need to remove the noise as shown below.

Noise removal

There exist a lot of OCR solutions, I highly recommend using pytesseract.

After identifying the CAPTCHA text code, you can do the same technique as the previous Forum.

Due to the different design structure of Dark Web forums, it is necessary to develop a specific crawler for each forum. This procedure takes time due to the large number of forums.

Note!! One of the biggest challenges when monitoring the dark web is identifying relevant forums.

Collecting Forum Updates

The collection of updates can be periodic or incremental.

Periodic Crawling consists of periodically collecting multiple version of web page.

Incremental Crawling consists of collecting new and updated content of webpage, in the case of forum, we want to collect newly posted posts.

In terms of speed and less duplication, incremental crawlers are more efficient than periodic crawlers.

Dark Web Websites Analysis

As I mentioned before, in Dark Web there is a wide variety of website categories ranging from drug and gun brands to pornography to terrorism and cybercrime.

How to identify our data ??

We can do like the previous article on monitoring paste websites and use REGEX to identify company name, domain, email addresses, IDs, credit card numbers, IP addresses , etc.

We are interested in hackers and cybercriminal websites, but the problem is that our crawlers will stack on irrelevant websites like drug and porn sites that exist in huge quantities.

We need to find a way to classify websites based on content !!

My methodology is to categorize Dark Web websites into 2 categories, relevant and irrelevant.

  • relevent: hackers and cyber criminals websites
  • irrelivent: any other websites

How can classification be done ??

I use dictionnary of Hacking and breaches keywords.

The first step: creating training dataset.

I configure the crawlers to classify websites as relevant if more than 5 dictionary keywords exist in the website, otherwise the websites will be classified as irrelevant.

The result of this classification will be saved as CSV file, where there are two columns: webpage text and classification.

Final step: Applying Machine Learning algorithms and choose the one with the highest accuracy.

In this case, we will be able to categorize websites efficiently using intelligent crawlers.

Dark web Forums Analysis

As I mentioned, forum structures differ from website structures.

How to identify our data in dark web forums ??

In forum, we can search for our data by applying REGEX on posts and replies.

Some security researchers use other analytical techniques when monitoring Dark Web forums, such as sentiment analysis, relationship analysis, and authorship analysis.

Sentiment Analysis

This technique is used to identify the polarity of criminals by analyzing their exchanged forum posts.

Relationship Analysis

This technique is used to identify relationships between criminals based on their interactions (likes, posts).

Authorship Analysis

Based on a criminal posts in a forum, some sort of signature will be associated to him, this technique will allows us to track him in different forums even when using different usernames.

Dark Web Monitoring (Part 2) covers Dark Web Forums Monitoring in depth.

Copyright © 2020 Abdelkader Ben Ali, All Rights Reserved.

Abdelkader BEN ALI is a cyber threat intelligence analyst @spiderSilk.
He is passionate about designing, developing and implementing customized python based tools and extending open source projects in order to simplify and automate security analysts daily tasks.

You can connect with him on LinkedIn, Twitter.

--

--