P1sty for Fraud Prevention

10 min readAug 7, 2020

Keeping your organization secure isn’t just detecting and responding to threats that already exploit your systems. You must also prevent fraudulent use of your data or your brand.

When a data breach occurs to your organization, third party or service provider, hackers gain access to personally identifiable information including Social Security numbers, Driver’s License Numbers, Bank Account Information, Passwords and Dates of Birth, that can be sold to or shared with criminals on dark web forums, deep web sites and paste websites who will use it to commit identity theft and make fraudulent financial charges.

In addition to that, they can use leaked credentials to get access to or move through your network.

Note!! Fraud prevention is about looking for payment card numbers or bank identifier numbers to prevent payment fraud, credentials dumps to prevent attacker from accessing your network or your employees Mailboxes(e.g in case of non use of MFA) and more than that, newly registered domains impersonating your company to conduct phishing attacks against you or your partners. Then all finding will be correlated in order to identify who is the threat group / individual responsible for that.

Web Layers

The web is divided into three categories which are Surface, Deep and Dark Web.

Surface Web is the regular Web that is visible to all users using the Internet. Surface Web websites are indexed by search engines such as Google, Bing, Yandex, etc. The surface web is only a 4% area of the Internet used by users.

Some people think that Deep Web and Dark Web are the same, which is wrong.

Deep Web is a subset of Internet data, not indexed by search engines, but accessible if you have the exact address.
For example: Google Drive, paste websites are considered a deep web because they cannot be accessed without a specific address or URL.

Dark Web is a subset of the Deep Web, not indexed by search engines and requires specialized anonymity systems to access it, such as The Onion Router (TOR) and The Invisible Internet (I2P). The Dark Web offers anonymous content hosting that encourages cyber criminals to share information, and therefore it can be a valuable area for identifying current and future threats.

Dark web website selling Bank Account Information

Dark Web Monitoring Challenges

I have been monitoring the dark web for a very long time, so I will present you the challenges that you might encounter:

When dealing with dark web websites, it is easy to analyze, but the problem is the huge volume of porn websites that stuck your crawlers.
Discovering dark web forums and websites is challenging due to the lack of a centralized indexers.
Some websites and forums are UP in the morning and DOWN in the afternoon, keeping track on a website/forums is really challenging when it change its onion address.
Dark Web forums often require membership. Some forums even require you to leak information about your organization that will be verified within 3 days, if done you’ll become a member.
Dark Forums often requires logins and dynamic CAPTCHAs, making it impossible for your crawlers to access them without your assistance.
Presence of numerous languages.
Multimedia file sizes are significantly larger than indexable files resulting in longer download times and timeouts.
Processing and analyzing large amount of data is difficult.

Proposed system architecture

Through this article, i’m going to guide you through the development of a python Tool “P1sty” (pronounced [pæsti]) that monitors paste websites and dark web public forums and sites, looking for any information that can be used to commit fraud against your organization.

Once detected you will be notified in real time through Email, SMS, also a MISP event will be automatically created.

Once notified you should verify the certainty of the alert, if it is true positive than you should inform Fraud department, also you can notify your MISP community.

Paste websites monitoring

The majority of the data dumps are being shared on paste sites such as pastebin.com, pastebin.fr, ghostbin.co, lpaste.net, selexy.org, etc.

Paste sites are hosted on the deep web, this means that they’re viewable in a regular internet browser, but the content is not indexed by Google and other conventional search engines.

There exists also paste sites on the dark web that offer high anonymity through Tor or I2P. For example, the Dark Web’s DeepPaste is used for advertising illegal services such as financial fraud, ransomware, child pornography, etc.

Credentials leak

Supposing your email is myemail@mydomain.com and you password is my_password, your credential can be leaked in different format:

myemail@mydomain.com:my_password
myemail@mydomain.com|my_password
myemail@mydomain.com,my_password
myemail@mydomain.com;my_password
Email Address: myemail@mydomain.com
Password: my_password
Username: myemail@mydomain.com
Password: my_password
…

A regular expression (Regex) is a sequence of characters that define a search pattern.

Email Regex

>>> import re# regex of all email belongs to mydomain.com
#(?i) case insensitive
>>> regex = "(?i)[a-zA-Z0-9._-]+@mydomain.com"# text:
>>> text = "hello world jonathan@mydomain.com user@MyDomain.com jonathan@gmail.com mydomain.com this is only a test"# Extract all emails belonging to mydomain.com from text:
>>> re.findall(regex,text)
['jonathan@mydomain.com', 'user@MyDomain.com']

Credential Regex

# text:
>>> text = "hello world jonathan@mydomain.com:PaSS123*-* user@MyDomain.com test@MyDomain.com:Hello1234 jonathan@gmail.com test@mydomain.com;14525 this is only a test">>> import re# Let's suppose you want to search for credentials related to mydomain.com with the format email:password.>>> regex1 = "(?i)[a-z0-9._-]+@mydomain.com:[a-z0-9\_\-*]+">>> re.findall(regex1,text)Result: ['jonathan@mydomain.com:PaSS123*-*', 'test@MyDomain.com:Hello1234']# Let's suppose you want to search for credentials related to mydomain.com with the format email;password.>>> regex2 = "(?i)[a-z0-9._-]+@mydomain.com;[a-z0-9\_\-*]+">>> re.findall(regex2,text)Result: ['test@mydomain.com;14525']# Let's suppose you want to search for credentials related to mydomain.com with the format email:password and email;password.>>> reg_list = [regex1,regex2]>>> regex3 = re.compile('|'.join(reg_list))>>> re.findall(regex3,text)Result: ['jonathan@mydomain.com:PaSS123*-*', 'test@MyDomain.com:Hello1234', 'test@mydomain.com;14525']# Let's suppose you want to search for credentials related to mydomain.com with the format email:username@domain and password:password.text= "100000 LinkedIn accounts email: email1@mydomain.com password: my_password1 email: email2@mydomain.com password: my_password2 email: email3@mydomain.com password: my_password3  have fun ;)">>> regex = "email: (.*?) password: (.*?) ">>> re.findall(regex, text, re.DOTALL)Result: [('email1@mydomain.com', 'my_password1'), ('email2@mydomain.com', 'my_password2'), ('email3@mydomain.com', 'my_password3')]

Credit Card Dumps

If you are a bank, then you can look for leaked credit cards related to you ;)

General credit cards regex: r“\b(\d{4}[ -]?\d{4}[ -]?\d{4}[ -]?\d{4})\b”

The BIN of this card is the first 6 digits (123456)

If your card BIN is 123456, then your card regex will be as below:

Regex: r”\b(1234[ -]?56\d{2}[ -]?\d{4}[ -]?\d{4})\b”

# text:
>>> text = "Cards Dumps 1234561234567852 5412152478963555 
1234-5614-8756-7852 1234 5698 1245 1458">>> import re>>> card_regex = r"\b(1234[ -]?56\d{2}[ -]?\d{4}[ -]?\d{4})\b">>> re.findall(card_regex,text)Result: ['1234561234567852', '1234-5614-8756-7852', '1234 5698 1245 1458']

Note!! In most shared card or credentials dumps you will find bitcoin and / or the email address of the criminal who wants to sell full dumps ;)

What data are we going to extract from paste websites?

Paste link
Paste raw data
Paste username

Criminal username: HAILSEAN

Why username ?

For correlation + we can get all pastes created by a specific username in some paste websites ;)

All pastes created by username: HAILSEAN in pastebin.com

Company’s credentials and email addresses
Credit cards
Bitcoin address ( of the person who shared the dump)
Email address ( of the person who shared the dump)

Based on what we are going to correlated MISP events?

Card numbers
Email addresses
Paste websites usernames
Bitcoin addresses

How to create Web Crawlers

Note!! we are going to create specific crawlers for each paste website ( focused crawlers).

Example 1: pastebin.com

Let’s start by the famous pastebin.com

Currently, there is no API available for pastebin.com, so we will switch to the traditional method xD

Archive url = https://pastebin.com/archive contains latest 50 pastes.

Identifying latest pastes links

$ pip install scrapy$ scrapy shell https://pastebin.com/archive# Latest pastes IDs>>> Latest_pastes_ids = response.xpath('//tr/td[1]/a/@href').extract()Result: [u'/1bjymdm5', u'/t1QRY3fH',....., u'/gkZwQ5Ya', u'/VWnwXSBm']# Latest pastes links>>> latest_pastes_links = ["https://pastebin.com"+paste_id for paste_id in Latest_pastes]Result: [u'pastebin.com/1bjymdm5', ....., u'pastebin.com/t1QRY3fH', u'pastebin.com/3ue6SYFU']

Extract raw text and username from paste

# extracting raw text + username from pastebin.com/iHz82x5M$ scrapy shell https://pastebin.com/iHz82x5M>>> raw_text = response.xpath('//textarea/text()').get()>>> username = response.xpath("//div[@class='username']/a/text()").get()

Example 2: pastebin.fr

Archive url = https://pastebin.fr/ contains latest 100 pastes.

Identifying latest pastes links

$ scrapy shell https://pastebin.fr/# getting latest pastes links
>>> latest_pastes_links = response.xpath("//a[contains(@href,'http://pastebin.fr/')]/@href").getall()Result: [u'http://pastebin.fr/63652', u'http://pastebin.fr/63651',......u'http://pastebin.fr/63498']

Extracting raw text and username

# extracting raw text + username from pastebin.fr/63628$ scrapy shell http://pastebin.fr/63628>>> raw_text = response.xpath("//textarea/text()").get()>>> username = response.xpath("//div/h1/em/text()").get()

As crawling methodology is common for the majority of paste websites and developing a specific crawler takes time, I decided to develop P1sty.

P1sty

P1sty is a python tool that allows you to generate fully coded P1sty spiders for any given paste website in less than a second.

P1sty’s spider functionalities:

Monitor a given paste website for occurrence of your business name, domain, emails, credentials, Credit cards, etc.
Automatically create MISP event.
Real time email Notification.

Usage

$ python p1sty.pyUsage: p1sty.py [options]Options:
  -h, --help            show this help message and exit
  -n NAME, --name NAME  Spider name
  -s START, --start START
                        Archive url
  -m MISP, --misp MISP  Misp events url
  -k KEY, --key KEY     Misp key
  -lpx LPX              Latest pastes XPath
  -rpx RPX              Raw paste data XPath
  -ux UX                Username XPath
  -d DOMAIN, --domain DOMAIN
                        Company domain
  -b BIN, --bin BIN     Company BIN
  -se SENDER, --sender SENDER
                        Sender email address
  -p PASSWORD, --password PASSWORD
                        Sender email password
  -r RECIPIENT, --recipient RECIPIENT
                        Recipient email

Example 1: pastebin.com

$ python p1sty.py -n "pastebin" -s "http://pastebin.com/archive" -m "https://Misp_URL/events" -k "MISP key" -lpx "//tr/td[1]/a/@href" -rpx "//textarea/text()" -ux "//div[@class=’username’]/a/text()" -d "p1sty.com"

Example 2: pastebin.fr

$ python p1sty.py -n "pastebin_fr" -s "http://pastebin.fr/" -m "https://Misp_URL/events" -k "MISP key" -lpx "//a[contains(@href,'http://pastebin.fr/')]/@href" -rpx "//textarea/text()" -ux "//div/h1/em/text()" -d "facebook.com"

The generated spider will monitor pastebin.fr, looking for occurrence of domain: “p1sty.com” and any url/email/credential related to it also credit cards with BIN = 123456.

Below you find some parts of the generated P1sty spider code.

Let’s test it ;)

To start, we need first to launch our created P1sty spider:

$ scrapy crawl fr_pastebin

I am going to write 3 pastes:

1- In the first paste i will mention my company domain: p1sty.com.

2- In the second paste i will mention multiple emails related to my company.

3- In the third paste i will mention multiple credentials related to my company.

Result

3 events automatically created on my MISP instance

First event: Company domain found in pastebin.fr

Second event: 3 emails found in pastebin.fr

Third event: 3 creds found in pastebin.fr

Events correlation:

Through MISP we were able to correlated both second and third events based on the username (“jonathan”) of the person who posted both pastes and the occurrence of emails related to P1sty.com.

Email notification

If you want to receive email notifications, then you need to provide sender email address, sender password and recipient email address.

Let’s try it on https://paste.debian.net/

https://paste.debian.net/ contains latest 10 pastes.

Latest pastes links xpath: //a[contains(@href,’//paste.debian.net/1')]/@href

Raw paste data xpath: //td[@class=’code’]/div/pre/text()

Paste username xpath: //div[@id=’content’]/h1/text()

$ python p1sty.py -n "debian_paste" -s "https://paste.debian.net" -m "https://Misp_URL/events" -k "MISP key" -lpx "//a[contains(@href,’//paste.debian.net/1')]/@href" -rpx "//td[@class=’code’]/div/pre/text()" -ux "//div[@id=’content’]/h1/text()" -d "p1sty.com" -se "sender email"  -p "sender email password" -r "recipient email"$ scrapy crawl debian_paste

It is really simple :) with P1sty you can monitor any paste website in less than a second.

For the full project, you can find it on my GitHub.

In the following articles, I’ll walk you through the extension of P1sty to monitor the Dark Web and the integration of machine learning.

Note!! For those who haven’t used MISP, i will published an article detailing its usage.

Abdelkader BEN ALI is a cyber threat intelligence analyst @spiderSilk.
He is passionate about designing, developing and implementing customized python based tools and extending open source projects in order to simplify and automate security analysts daily tasks.

You can connect with him on LinkedIn, Twitter.