Combating human trafficking using machine learning: Part 1.

Juanchobanano
6 min readJun 2, 2022

--

Photo from Canva. License terms can be found here.

Hey! Welcome to my first article in medium. In these series of posts I summarize part of my progress working on an aside project related with the identification of human trafficking networks in a Canadian escort listing website. In this first post I introduce the concept of human trafficking and then present Chain Breaker: an open data-based solution for combating this crime.

Let’s start!

But, what is human trafficking?

The United Nations Trafficking in Persons Protocol defines human trafficking as:

“recruitment, transportation, transfer, harboring or receipt of persons, by means of the threat or use of force or other forms of coercion, of abduction, of fraud, of deception, of the abuse of power or of a position of vulnerability or of the giving or receiving of payments or benefits to achieve the consent of a person having control over another person, for the purpose of exploitation” (Art 3 (a) of the UN Trafficking in Persons Protocol, 2000).

It is possible to identify at least nine forms of human trafficking, however, I focus on one of its faces: sexual exploitation. Sexual exploitation is defined as forcing, deceiving, or coercing a person to take part in sexual activities. When this form of exploitation involves children, it is known as Child Sexual Exploitation. According to the United Nations Global Report on Trafficking in Persons (2020), sexual exploitation of women and girls continues to be the most detected form of human trafficking world-wide. This problem is amongst the challenging ones facing law enforcement as it is difficult to identify victims and counter traffickers. Besides that, it is one of the fastest-growing criminal industries worldwide and one of the most severe human rights violations today.

Due to the increasingly use of internet technologies around the world and important factors such as the pandemic caused by COVID-19 disease,

“traffickers have also adapted their modus operandi for cyberspace by taking advantage of digital platforms to advertise, recruit and exploit victims. Through the internet, traffickers easily gain access to an increased pool of customers, particularly sex buyers. In some cases, traffickers hide exploitative services behind legitimate activities to avoid detection by law enforcement while targeting their client base […] carefully crafted language with coded keywords and pictures are used to attract potential clients” (UNODC, 2020).

Global Report on Trafficking in Persons (UNODC, 2020) include a dataset where it is possible to find several human trafficking real cases related with classified web pages for advertisements. One of the most popular cases is the listing site Backpage, an US classified website which was sized by the US Federal Bureau of Investigation in April 2018. Although Backpage and many other similar sites have been shutdown in the recent years by different law enforcement institutions, there are still many listing websites online.

Emerging websites

After Backpage shutdown, many other websites emerged as a destination for human trafficking. In 2019, a new Canadian website drawn Canadian police attention due to concerns that some of its users might be facilitating this crime. In fact, according to Toronto police office,

Now we’re seeing more and more victims [of human trafficking] being advertised on that site, and it seems to me … that it’s actually taken over from Backpage and is the number one site that pimps are using to advertise the sexual services of the victims they’re forcing into the sex trade (CBC, 08 Feb 2019)

Unfortunately, shutting down these listing websites might not be the best approach to deal with this issue because it would rather encourage people into more clandestine activities to offer this kind of services. In addition, it often makes it more challenging for law enforcement to conduct investigations since most of the data would be also deleted.

So, what are the main problems?

  1. Human trafficking on listing websites: Traffickers are using cyberspace for advertise, recruit and exploit victims. Shutting down these websites is infeasible, since traffickers would whether move to a new website or start offering more clandestine services.
  2. Manual work: Nowdays many prosecutors often keep an eye on these listing websites in order to find potential human trafficking cases. However, this work is very time consuming and usually is very difficult to see the big picture, i.e, is very challenging to understand how all these advertisements are (possibly) connected with the same criminal organizations.
  3. Money laundering: Traffickers receive and send money using several financial institutions services. Current approaches that deal with this problem are based on the identification of anomaly transactions, however, traffickers are more and more sophisticated nowadays to hide their activities.

So… what we can do?

Chain Breaker — Help us combat human trafficking using data

Chain Breaker Site — Help us combat human trafficking using data (https://chainbreaker.community/)

Chain Breaker (https://chainbreaker.community/) is a cloud-based tool that automatically searches, extracts, stores and analyzes information from websites that offer sexual services to identify potential victims of sexual exploitation, suspected criminal human trafficking networks, and sex industries suspected of concealing exploitative activities.

It uses web scraping to collect phone numbers, emails, names, locations and pictures, and natural language understanding to extract the advertisement’s linguistic characteristics associated with a particular criminal network or sexual industry suspicious of hiding exploitation activities.

The tool seeks to provide high-quality and structured data to anti-trafficking organizations, law enforcement, financial institutions, and companies to support analysts and investigators in identifying online dynamics linked to potential sexual exploitation activities.

Data Ingestion and Preprocessing

In this series of posts, we use the data extracted from the “escorts” category from one of the biggest listing sites from Canada, and use this information for building knowledge graphs and characterize their emerging communities for the detection of suspicious clusters according to the criterion of prosecutors working in this field. Thus, Chain Breaker extracts the following fields from each advertisement, namely:

- Link (advertisement URL)
- ID Page (unigue advertisement identifier)
- Title
- Text
- Category
- Publication date
- Phone number
- Email
- Region
- City
- Place (specific address within a city)
- Latitude & Longitude
- External website (some advertisements include an URL to another website where they also promote their services, this might include websites such as Twitter, Only Fans, Facebook, private webistes, among others)
- Ethnicity (asian, caucassian, black, mixed, etc)

Once the data is obtained, only the region and city variables are format so any accent mark is removed and then they are lowercase for standardization purposes and the title and text variables are saved in their original format. All this information is uploaded to MySQL and Neo4j (a graph database) for exploiting the emerging relationships in the data using the phone number, email and external websites variables (however, we will do this in the next posts!).

Accessing the data (Kaggle dataset!)

I want Chain Breaker to become a community of data scientists that join efforts together for fighting human trafficking. That’s why from now the information extracted by Chain Breaker will be accessible by anyone, and will be published periodically in Kaggle.

This dataset consist in 3463 advertisements, however, the first 1924 advertisements do not contain the ethnicity nor the age variables. In addition, the link, id page are removed from the dataset, and the phone number, email and external website are encrypted using the SHA256 algorithm for security purposes, however, this does not affect the emerging connections that can be found in the data. The dataset can be access in Kaggle using the following link.

What’s next?

In the next post, I present the features that we seek to extract from our dataset based on previous research literature on this topic. In addition, I show how we can represent our dataset as a graph and motivate its importance.

References

  • CBC (08 Feb 2019) After Backpage shutdown, Toronto police say LeoList is emerging as a destination for human trafficking. Link
  • Global Report on Trafficking in Persons (2020). Link

--

--

Juanchobanano

Hi! My name is Juan Esteban Cepeda. I’m a computer scientist interested in human trafficking, computational consciousness and reinforcement learning.