What Is This Github You Speak Of?

Silver Keskkula
2 min readSep 7, 2016

--

I fired up Googles Bigquery on the Github Archive and looked at 332 million github push events (51.5s elapsed, 31.5 GB processed) to public repositories since February 2011.

Every contribution looks like a new one initially as the Archive does not have a history of who contributed before 12 Feb 2011, but the effect disappears quick and we can see a pretty healthy growth curve.

The most staggering statistic is the average amount of about six thousand NEW people pushing code to github every day. Essentially

More NEW people push code to github in a week than graduate computer science in a year in US!

If we look at the count of pushes and repositories, they are however growing pretty much hand in hand:

Which means most people are only pushing to very few repositories. In fact the average number of repositories each user is contributing to is just 1.05.

With all the Coursera homeworks out there I guess this is expected, but part of me was hoping to see more collaboration out there on average.

I jumped into this hoping to learn something about the changing geography of git pushes for Teleport, but it looks like the Github Archive guys have not made the location field on push history available. Given the redundancy I can certainly understand them, but wouldn’t it be cool ….

Maybe you can help me get their attention by heart’ing this post or perhaps someone feels like pushing the limits of github api for location queries? ;)

Refs:

  1. https://github.com/
  2. https://www.githubarchive.org/
  3. http://www.geekwire.com/2014/analysis-examining-computer-science-education-explosion/
  4. https://cloud.google.com/bigquery/

--

--

Silver Keskkula

entrepreneural monkey coding for fun, 2 exits, first researcher of Skype core team, Lived in 11 countries