What Data Says about TechCrunch

In San Francisco, “buzzwords” like AI, Internet of Things, and impact are thrown around a lot. However, for each industry, who are the most influential players? That was the question I strived to figure out, and decided to scour the world’s most prolific online source for everything startups and technology: TechCrunch. So what buzzwords connect the articles in TechCrunch together for each category of startup?

I soon realized TechCrunch does not have an API to do any analysis for their articles upon articles of valuable information. No problem. With Major lazor, a pair of headphones, and some Peets Coffee, I set out to get that data.

My friend Mark Surnin and I wrote a simple script to extract all the articles links to a TechCrunch category.. I extended and fixed up geograpy2 using some NLTK and POS-tagging to get the nouns from each article, in an attempt to see which nouns were most popular across categories of tech.

The results?

Across all of the various categories, one thing stood clear: big companies ruled the space, with Google, Apple, and Microsoft snagging the most facetime in TechCrunch’s articles. For Artificial-intelligence, events and break-throughs such as Mayhem, Watson, CSC, and CTF stood out (find out more about them here, here, and here respectively). Surprisingly (to me), Apple took third place. IoT, surprisingly, also took a pretty high chunk.

When it came to Enterprise, the distribution of nouns were more evenly distributed, with Microsoft leading the way (not surprisingly).Technologies such as the iPad, Android, and Google came next, while the rest of the nouns were filled with companies that we all know and love.

The surprise came in E-Commerce, featured above, which was a lot more varied. In terms of geographical location, E-Commerce was the only category with more than just the US on top, with China and India pretty prominent on the map. Craves, an IOS app, snagged one of the top spots, securing 1.4% of the nouns present in the E-Commerce tag (yes, no one noun dominated the entirity of TechCrunch). The key players here were also very different, with older companies such as Walmart and Amazon, as well as startup Flipkart.

Using similar analysis, IoT was dominated evenly between a group of companies that included Intel, Cisco, Samsung, Apple, Amazon, and Google, as well as technologies such as M2M and HaaS.

Verdict? Amazon and Google wins in terms of face-time, and E-commerce seem the most geographically diverse. Pretty touche, given that my first blog involved showing that E-Commerce is the most prevalent startup industry for emerging economies.

Anyhow, that’s the interesting tid-bit for this afternoon. I’ve also exposed the code if you want to pull some article links or nouns and do some of your own analysis with data from Techcrunch here, enjoy!