Kevin Burton
Datastreamer
Published in
3 min readNov 5, 2018

--

Google’s Web Crawler Inadvertently got CIA Spies Killed

The Telegraph and Yahoo News are reporting that Google’s Web Crawler (googlebot) inadvertently got 30 CIA operatives killed due to a misconfiguration in the website allowing Google to crawl the content and index it thereby adding it to the Google crawler corpus.

“We’re still dealing with the fallout,” said one former national security official. “Dozens of people around the world were killed because of this.”

This occurred back in 2011 so this isn’t something actively happening today but it’s still rather disturbing.

At Datastreamer we fetch about 500GB per day of content via HTTP which results in about 100GB of fresh and unique content. The derived content is used in search engines around the world as we license the data to companies interested in processing the content to build new and compelling applications without having to worry about data collection.

We have a very specific Terms of Service which prohibits predatory use of the content. No email harvesting, no spamming, and certainly nothing that would hurt anyone.

I have to admit though, this issue keeps me up at night.

Doxing is a very real problem. Every little piece of information you provide about yourself online is just another clue that someone can use to unlock your real world persona.

Even if you don’t disclose this data publicly, they could hack into Google or Twitter or even your ISP to obtain your IP address and other sensitive data.

With the Cambridge Analytica incident this is now a clear and real world threat to more people.

It’s been obvious for anyone involved in social media analysis but it’s becoming more and more out in the open for anyone to see — which is actually a good thing as people are able to properly align their paranoia with the real world threats they can expect to encounter.

There aren’t every good solutions here. The main problem is that people just somewhat refuse to use good information hygiene.

This CIA incident was from 2011. PGP has existed in a usable form since 1995 (and as early as 1991) and could have been used to protect the individuals involved.

If the UI wasn’t perfect the CIA could have fixed it — they have the resources.

Facebook, Google, and Twitter aren’t really in a position to fix this problem. They’re PART of the problem. They benefit from your data being in the clear.

We’re going to need better forms of group encryption so that people you communicate with can receive your messages securely.

There are real world implications here even if you’re not a CIA spy. China, Russia, etc could be actively using your content for industrial espionage for example.

Not a very likely situation for most of us but if you work for Google, Microsoft or any other valuable company you’re definitely on their radar.

At the very minimum, if you’re in a sensitive scenario use a tool like Signal or PGP.

--

--