Iconoclast — a child prostitution hunter

Afik Cohen
aphex.cx
Published in
4 min readJul 22, 2016

--

I recently talked to Thorn — a nonprofit that uses technology and data science to find and rescue sexually abused children. It turns out over 75% of child prostitution is advertised online — mostly on backpage.com (essentially, a seedier Craiglist).

Their Innovation Lab here in San Francisco is hiring engineers to help in that mission. Incredibly, thanks to their work, and in collaboration with worldwide enforcement agencies, they’ve brought down the amount of time it takes to rescue children from a period of years to a matter of days.

At our first meeting, I nearly broke down in tears.

Their lead engineer told me a story about a girl recently rescued in Europe. She was being pimped out by her father online. Through a variety of techniques including analyzing the listing’s text and the backgrounds of the associated explicit photos (and determining the hotel room they were taken in) and cross-referencing this data with their existing database, they were able to rescue the girl from her abuser — and also save her little sister, who was not yet old enough to be abused.

I had to help.

So I built Iconoclast — a system that finds every underage Backpage.com listing in the world.

How does it work?

Iconoclast deploys millions of bots to scrape all of the escort listings on Backpage and uses image recognition to figure out which ones have underage photos.

The Backpage.com homepage.

That’s the homepage — there’s one for every major city, a-la Craiglist. By building a bot to read one listing, I could handle all of them by adding a bit of traversal code.

Getting the photos

Scraping is not a new art, but Scala Scraper makes it so darn easy. The scraper I wrote to convert each escort listing off of Backpage into a simple case class ended up being less than ten lines of Scala.

All around me are familiar faces…

Remember last year when everyone on Facebook was posting these?

With apologies to my friends who are the unfortunate subjects of this photo. They’re not this old.

Well, Microsoft Cognitive Services has a commercial version of this. It’s called Project Oxford and it’s really, really good.

Source: Microsoft

Using their convenient API, I had a quick and pretty darn reliable way of flagging underage photos, so I wrote an analyzer service that processed the photos from each listing through it.

What about a front end?

Easy! I’d been looking for a reason to play around with ScalaJS. It was only last year that I watched Li Haoyi build an interactive ScalaJS webapp in half an hour at ScalaDays, and thanks to the new optimizations built into the ScalaJS toolchain, you can write real-world Scala and use real-world libraries (I used React) and still generate tiny, performant, and correct Javascript!

By following the excellent ScalaJs Single Page App tutorial, I was able to have a beautiful webapp up and running in an afternoon. 👍 It displays statistics about the current listings the system is analyzing, as well as an underage listing browser that shows the underage ones it found.

Run it yourself!

Here are the repos for Iconoclast’s 4 components:

And here’s a spiffy diagram of Iconoclast’s architecture:

Originally published at www.aphex.cx on July 22, 2016.

--

--