I recently talked to Thorn — a nonprofit that uses technology and data science to find and rescue sexually abused children. It turns out over 75% of child prostitution is advertised online — mostly on backpage.com (essentially, a seedier Craiglist).
Their Innovation Lab here in San Francisco is hiring engineers to help in that mission. Incredibly, thanks to their work, and in collaboration with worldwide enforcement agencies, they’ve brought down the amount of time it takes to rescue children from a period of years to a matter of days.
At our first meeting, I nearly broke down in tears.
Their lead engineer told me a story about a girl recently rescued in Europe. She was being pimped out by her father online. Through a variety of techniques including analyzing the listing’s text and the backgrounds of the associated explicit photos (and determining the hotel room they were taken in) and cross-referencing this data with their existing database, they were able to rescue the girl from her abuser — and also save her little sister, who was not yet old enough to be abused.
I had to help.
So I built Iconoclast — a system that finds every underage Backpage.com listing in the world.
How does it work?
Iconoclast deploys millions of bots to scrape all of the escort listings on Backpage and uses image recognition to figure out which ones have underage photos.
That’s the homepage — there’s one for every major city, a-la Craiglist. By building a bot to read one listing, I could handle all of them by adding a bit of traversal code.
Getting the photos
Scraping is not a new art, but Scala Scraper makes it so darn easy. The scraper I wrote to convert each escort listing off of Backpage into a simple case class ended up being less than ten lines of Scala.
All around me are familiar faces…
Remember last year when everyone on Facebook was posting these?
Using their convenient API, I had a quick and pretty darn reliable way of flagging underage photos, so I wrote an analyzer service that processed the photos from each listing through it.
What about a front end?
By following the excellent ScalaJs Single Page App tutorial, I was able to have a beautiful webapp up and running in an afternoon. 👍 It displays statistics about the current listings the system is analyzing, as well as an underage listing browser that shows the underage ones it found.
Run it yourself!
Here are the repos for Iconoclast’s 4 components:
- API https://github.com/aphexcx/iconoclast-api — Serves API requests to all the other components. Run it with
sbt runto launch the api server on port 9000. You'll need a MongoDB instance to be running for it - it's super easy to install one by following this tutorial. No need to configure it - the default port of 27017 is fine.
- Webapp https://github.com/aphexcx/iconoclast-webapp — Serves the ScalaJS frontend. Run it with
sbt "run 8080"to launch the webapp on port 8080. Requires the API server to be up.
- Scraper https://github.com/aphexcx/iconoclast-scraper-backpage — Runs the backpage scraper actors. Run it with
./start.sh(just waits for the API server to come up and launches sbt run. Communicates with the API.
- Analyzer https://github.com/aphexcx/iconoclast-analyzer — Runs the face recognition jobs on the images. It grabs them from the image collection in the database via the API, so it requires the API server to be up.
sbt runlaunches it.
And here’s a spiffy diagram of Iconoclast’s architecture:
Originally published at www.aphex.cx on July 22, 2016.