Data Against Human Trafficking

Stefan Hall
Trends in Data Science
11 min readMar 22, 2021

Today, millions of people worldwide suffer at the hand of organisations and individuals who exploit them for economic gain. Thanks to advocacy groups and the increasing numbers of news reports, it is slowly burgeoning in the public awareness that there are more slaves in the world today than at any point in human history (Hodal, 2019). This global slave economy, with an estimated 40.3 million people, relies on the abduction, coercion, trafficking and exploitation of vulnerable people (Walk Free Foundation, 2018). This number is just an approximation though as it is currently very difficult to get an accurate measure of the number of people trafficked worldwide (Do Something, 2020). Indeed, the main difficulty in developing solutions to this problem is the lack of data.

In response, researchers and activists around the world are finding ingenious methods of collecting and sharing data, identifying where and when trafficking is occurring and promoting clear, data-driven legislation to curtail this problem. International databases are being built with uniform data standards and machine learning algorithms are being trained on what data is available. Despite challenges and setbacks, these innovations are making an impact.

The Challenge

Victims of trafficking are found in every country in the world, but the crime itself and its markers often remain hidden (UNODC, 2020). Where it once was conducted in physical black markets or underground business networks, it has now adapted to new technologies and thrives in the digital world. Criminals are fluent in the use of social media and online advertising and are commonly reported as utilising a combination of smartphone apps and internet dependent business models for recruiting victims and managing their finances (UNODC, 2020). While there are digital footprints left by these criminals, those footprints are often blurred or hidden and most of their activity is limited to the ‘dark web,’ out of reach of regular search engines, only accessible through the web browser ‘Tor’.

From Daire, Seth. (2015). Memex helps find human trafficking cases online. https://humantraffickingcenter.org/memex-helps-find-human-trafficking-cases-online/

Past Attempts

In order to address this lack of data in the past, law enforcement agencies attempted to identify organised crime groups through wide-ranging qualitative and quantitative surveys. A Composite Organised Crime Index was compiled by researchers to identify the countries where organised crime was most active (Dijk, 2007). While this pointed agencies in the right direction, these measures were broad and didn’t identify specific locations.

A number of organisations thus began the process of collecting the necessary data. Most notable perhaps is the United Nations Office on Drugs and Crime, which has been collecting data for decades now (UNODC, 2020). Other organisations within various nations have also begun creating databases with significant amounts of information on both trafficking rings and their victims. With this increasing scale of data though, comes the challenge of reconciling national data for an international problem.

Along comes Google.

The Hotline

Under their philanthropic arm ‘Google.org,’ a $3 million award was given to the Polaris Project, Liberty Asia and La Strada International to aid in their development of an international hotline for trafficking victims (Cohen & Fuller, 2013). The Global Human Trafficking Hotline Network collects self-disclosed data from victims of trafficking around the world.

This might seem like an outdated solution for this modern problem, but the most notable and indeed, the most cutting edge, feature of the hotline is its uniform data collection standard.

That sounds underwhelming.

What this means, is that data collected from one country can be easily read, understood and analysed in another country. It was also developed with continued rollout in mind and can be easily deployed to even more countries (Grothaus, 2013). The data gathered from these phone calls can then be collated with local government and medical data to help identify individuals and hopefully lead to their rescue and recovery. This is exactly what is needed for a crime that crosses so many borders.

These resultant databases while often valuable for analysts and law enforcement, consequently contain significant amounts of personal information. Without filters, an analyst potentially might have access to a victim’s credit history or medical records. This is why a granular security model designed by Palantir, a US based technology company, was engineered into the database. This model only allows relevant access to specific data points, so analysts are not overwhelmed with sensitive information (Grothaus, 2013). This promising innovation from Google, systematises data in a way that is internationally uniform and securely accessible, something that a complex international crime like trafficking requires.

The Impact

Since its adoption in the US in 2007, 63 380 human trafficking cases have been reported and hundreds of thousands of contacts have been received by the hotline (Polaris, 2019). This service is seeing traffickers stopped and victims rescued.

But in the fight against all of human trafficking, this tool has significant limitations. Firstly, the organisations providing the hotlines have limited capacity to receive calls, input accurate data for analysis and allocate necessary resources. Likewise, any hotline in a country without adequate infrastructure or where there is low trust in a government’s capacity and integrity, will see limited success (OMCTP, 2021). But the biggest limitation of these hotlines is the comparatively small amount of data coming in. A victim needs access to a phone, needs the right number and needs the time and space to make a call, sometimes none of which are possible. Only when a call is made, is relevant data piped into the database. While each individual making a call cannot be ignored and getting them help is of critical importance, for the scale of this global problem further innovation is required.

Machine Learning is hard

Fortunately, there are researchers working to solve this problem. By teaching machine learning algorithms to sift through hundreds of thousands of online advertising materials and extract relevant information, more data can be found to fight human trafficking.

This is no miracle cure though and there are significant hurdles in using this strategy. First and foremost is the lack of datasets available on human trafficking to train these algorithms (Fedorschak, Kandala, Desouza, & Krishnamurthy, 2014). The most common pursuit to remedy this has been to scrape data from social media and online advertisers where trafficking takes place. The challenge is that these advertisements often do not follow normal semantic guidelines and are intentionally obfuscated so they cannot be scraped, searched or extracted meaningfully (Kejriwal & Szekely, 2015). The results are often noisy, fuzzy and unstructured.

From Kejriwal, M. & Szekely, P (2015). Knowledge Graphs for Social Good: An Entity-centric Search Engine for the Human Trafficking Domain. Journal of Latex Class Files, 1–15

Add to this, once the data has been scraped, researchers need to establish ‘ground truths,’ advertisements that are clearly identified as being posted by a trafficker, to give the algorithms a frame of reference (Portnoff, 2018). This is very difficult as advertisements are often taken down after a few days, making it problematic to establish patterns of behaviour over time (Kejriwal & Szekely, 2015). Creating these ‘gold standard’ datasets, able to sufficiently train intelligent algorithms is therefore very time consuming and costly (Kejriwal & Kapoor, 2019).

Without sufficient ‘gold standards’ it is not possible to tune an algorithm which can generalise across domains sufficiently, due to the sheer variety across different websites. Nor can algorithms trained on regular websites be used in trafficking domains due to the stark difference between them (Kejriwal & Szekely, 2015). If an algorithm is trained, it is usually only adept at sifting through one specific domain, not thousands, as is required.

Machine learning is hard.

But not impossible

Despite this, researchers have developed multiple proof of concepts with access to only small, hand-labelled datasets which, when scaled, significantly hamper trafficking worldwide (Alvari & Shakarian, 2017). Backpage.com, a now defunct domain with a notorious adult’s section, provided one such dataset. The data was analysed and commonly occurring phone numbers were subsequently provided to law enforcement officers, giving them leads into criminal organisations and confirming the presence of traffickers (Datta, 2014). While not sufficient in itself, this dataset has been the primary ‘gold standard’ and a ground zero for many researchers building useful algorithms.

These algorithms are capable of making a real difference. Using what little training sets they had, machine learning algorithms were built to parse inscrutable terminology on dark web forums and anomaly detection algorithms were able to reduce the set of potential traffickers to investigate saving law enforcement time and resources (Portnoff, 2018). Other tools mined geospatial metadata to identify trends and, in real time, determine where trafficking rings were active (Grothaus, 2013). Many of these concepts and tools have been combined by various technology companies into analytics suites which are then deployed to law enforcement agencies and still used today.

In the United States, there are currently three notable suites:

· ‘Spotlight’ filters through a host of sex ads every day to help identify victims, especially children, and gives a picture of where the crime is being committed (Digital Reasoning, 2019).

· ‘Traffic Jam’ is used by law enforcement which, among other tools, uses AI for facial recognition to help find victims. In 2019, approximately 3800 victims were identified using this suite (Marinus Analytics, 2020).

· ‘Memex’ developed by researchers at DARPA, is capable of filtering through thousands of raw HTML pages and identify relationships between distinct points of data, showing their relationships and potentially mapping the movement of traffickers (Greenemeier, 2015). The Memex program also includes its own query engine and language which is scalable to much larger datasets when they become available (Kejriwal & Szekely, 2015).

From Hazy Research Group (2016). MEMEX / Human trafficking. http://deepdive.stanford.edu/showcase/apps#

They still make a difference

The byproduct of using these tools is a better understanding of this crime and swathes of evidence for prosecutors. After the implementation of the Memex system in New York for example, investigations into human trafficking conducted in arrests of prostitutes, went from below 1% to above 62% (Kejriwal & Szekely, 2015) and the state’s district attorney, has directly credited Memex in building a pool of evidence, which in some cases eliminated the need for a victim to testify (Greenemeier, 2015).

Data collection is just the beginning

These tools are leading to a better understanding of the nature of this crime and each year, more and more traffickers are being prosecuted (UNODC, 2020). However, attempting to capture and halt traffickers, while necessary for justice, will not stop the problem. Arrests and rescues must be supplemented with a targeted reduction of the economic benefits of using trafficked people, especially through sensitive, targeted legislation (Savona & Stanizzi, 2007). In order to facilitate holistic change, more accurate measures are required that focus on the specific markets which are driving the demand for trafficking and evidence-based, globally harmonious legislation must be developed in response.

This is possible.

The UN has been pushing for and seeing success in global legislative change to fight this problem since the year 2000 (UNODC, 2004). To aid this effort, more accurate and comprehensive data needs to be collected, without which, sufficient laws that disincentivise both traffickers and their beneficiaries cannot be passed (Savona & Stanizzi, 2007). Indeed, human trafficking legislation that contains evidence and research-based language is more likely to be passed through government and be enacted as law (Scott, Ingram, Nemer, & Crowley, 2019). Human trafficking is nuanced, with different forms of trafficking requiring different measurements and different laws (Savona & Stanizzi, 2007), but all are worth fighting.

From UNODC (2020). Global Report on Trafficking in Persons. https://www.unodc.org/documents/data-and-analysis/tip/2021/GLOTiP_2020_15jan_web.pdf

Data must continue to be collected and stored in uniform and securely accessible formats. Machine learning algorithms must be trained to mine deeper, wider and smarter and most importantly, good, holistic legislation must be adopted internationally. All of this is dependent on sufficient data. Fortunately for those who fight trafficking, as the world becomes more data-intensive, so too does the ability to fight this crime. Little by little, enslaved people are being found and set free, protective lines are being drawn around the vulnerable and traffickers are being caught. Global systems, be they legal, logistical or digital, can be and are being engineered which disincentivise and punish the exploitation of persons. With good data and new innovations, ending human trafficking is possible.

References

Alvari, H., & Shakarian, P. (2017). Semi-Supervised Learning for Detecting Human Trafficking. Security Informatics, 1–24.

Cohen, J., & Fuller, J. (2013, April 9). Fighting human trafficking. Retrieved from Google.org: https://blog.google/outreach-initiatives/google-org/fighting-human-trafficking/

Daire, S. (2015, May 13). Memex Helps Find Human Trafficking Cases Online. Retrieved from Human Trafficking Center: https://humantraffickingcenter.org/memex-helps-find-human-trafficking-cases-online/

Datta, M. N. (2014). Using Big Data and Quantititative Methods to Estimate and Fight Modern Day Slavery. The SAIS Review of International Affairs, 21–33.

Digital Reasoning. (2019, August 28). We found a way to empower law enforcement to identify & assist trafficked Children. Retrieved from Digital Reasoning: https://digitalreasoning.com/resources/thorn-case-study/

Dijk, J. V. (2007). Mafia markers: assessing organized crime and its impact upon societies. Trends in Organised Crime, 39–56.

Do Something. (2020). 11 Facts about Human Trafficking. Retrieved from Do Something: https://www.dosomething.org/us/facts/11-facts-about-human-trafficking

Fedorschak, K., Kandala, S., Desouza, K. C., & Krishnamurthy, R. (2014). Data Analytics and Human Trafficking. In M. C. Tremblay, D. VanderMeer, M. Rothenberger, A. Gupta, & V. Yoon (Ed.), International Conference on Design Science Research in Information Systems. Cham: Springer.

Greenemeier, L. (2015, February 8). Human Traffickers Caught on Hidden Internet. Retrieved from Scientific American: https://www.scientificamerican.com/article/human-traffickers-caught-on-hidden-internet/

Grothaus, M. (2013, May 14). How Google Is Fighting Sex Trafficking With Big Data. Retrieved February 2021, from Fast Company: https://www.fastcompany.com/3009686/how-google-is-fighting-sex-trafficking-with-big-data

Hazy Research Group. (2016, February 18). MEMEX / Human trafficking. Retrieved from DeepDive: http://deepdive.stanford.edu/showcase/apps

Hodal, K. (2019, February 25). One in 200 people is a slave. Why? Retrieved from The Guardian: https://www.theguardian.com/news/2019/feb/25/modern-slavery-trafficking-persons-one-in-200

Kejriwal, M., & Kapoor, R. (2019). Network-theoretic information extraction quality assessment in the human trafficking domain. Applied Network Science, 1–26.

Kejriwal, M., & Szekely, P. (2015). Knowledge Graphs for Social Good: An Entity-centric Search Engine for the Human Trafficking Domain. Journal of Latex Class Files, 1–15.

Marinus Analytics. (2020). Marinus Analytics. Retrieved from Traffic Jam: https://www.marinusanalytics.com/traffic-jam

OMCTP. (2021, January 20). Human Trafficking Hotlines: Fact Sheet. Retrieved from US Department of State: https://www.state.gov/human-trafficking-hotlines/

Polaris. (2019, December 31). Hotline Statistics. Retrieved from National Human Trafficking Hotline: https://humantraffickinghotline.org/states

Portnoff, R. S. (2018). The Dark Net: De-Anonymization, Classification and Analysis. Berkely: University of California.

Savona, E. U., & Stanizzi, S. (2007). Measuring Human Trafficking: Complexities and Pitfalls . New York: Springer.

Scott, J. T., Ingram, A. M., Nemer, S. L., & Crowley, D. M. (2019). Evidence-Based Human Trafficking Policy: Opportunities to Invest in Trauma-Informed Strategies . The American Journal of Community Psychology, 348–358.

UNODC. (2004). United Nations Convention Against Transnational Organised Crime and the Protocols Thereto. United Nations Office on Drugs and Crime (pp. 41–52). New York: United Nations.

UNODC. (2020). Global Report in Trafficking Persons. Vienna: United Nations.

Walk Free Foundation. (2018). Global Findings. Nedlands, Western Australia: Walk Free Foundation.

--

--