Big data stories in seconds: Hacker News and BigQuery

Felipe Hoffa
Nov 10, 2015 · 3 min read

After having a lot of fun with reddit’s data on BigQuery(collected by @jasonbaumgart, see the announcement and Max Woolf’s Howto), it was time to play with another forum that attracts a lot of attention: Hacker News.

Firebase hosts the official Hacker News API, and my friends at Firebase (Jenny Tong, @JamesTamplin) helped me obtain a dump of all Hacker News stories and comments since 2007. With this data in BigQuery, it was time to start querying.

Let’s start by visualizing Hacker News growth 2007–2015:

Image for post
Image for post

It’s interesting to see how growth has been stagnant since 2012. Why? Not sure. In the meantime I left sample code to this and other visualizations in an IPython/Jupyter notebook. Also make sure to read the comments at the announcement Hacker News on BigQuery post (thx Max Woolf).

Other visualizations in said notebook include the best times to post on Hacker News to get more than -let’s say- 30 votes:

Image for post
Image for post

Then most fun part of having a dataset in BigQuery is the ability to start combining it with others. For example, GitHub. When a project gets posted to the Hacker News homepage it generates a lot of attention — can we measure this?

Image for post
Image for post

I left the instructions to combine both datasets on an reddit /r/bigquery post.

But the fun doesn’t end there! My latest experiment is looking at the story of Bitcoin through Hacker News and reddit:

Image for post
Image for post

How-to to combine the 3 datasets (another /r/bigquery post).

There are so many more discoveries to find here! For example, I recently saw a presentation from Kodok Márton where he looks into what are the most famous books — by finding Amazon links.

The best part? It only takes seconds to answer your questions once you find this data in BigQuery. If you’ve never done it, find out how — you’ll be running queries like these in less than 5 minutes from now.

November 9th update: Deedy made it to the Hacker News frontpage with a full analysis of these 9 years of HN.

Google Cloud - Community

A collection of technical articles published or curated by…

Felipe Hoffa

Written by

Developer Advocate @Google. Originally from Chile, now in San Francisco and around the world. Let’s talk data.

Google Cloud - Community

A collection of technical articles published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Felipe Hoffa

Written by

Developer Advocate @Google. Originally from Chile, now in San Francisco and around the world. Let’s talk data.

Google Cloud - Community

A collection of technical articles published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store