When karma?

Mateusz Zatorski
4 min readApr 8, 2015

--

So I was bored one day and I thought it would be great to learn R.
What is R? R is a very powerful programming language for computational statistics, visualisation and data science.

Here is a bit of background why I even started with R (you can skip the next paragraph to get to the interesting bit, if you want to).

A few years ago I used to study Biophysics at Jagiellonian University in Cracow, Poland. One of the courses was a Statistical Analysis, but there was the problem I had with the course, I could not understand why I would need to know statistics. Sure I could calculate what is the probability of me winning the lottery, but why this would be useful for me? I was not interested… I’ve still studied everything carefully. Looking back now, that was a great decision.

So how do I learn a new language?

I just build something using this language. But what am I going to build? I spend a lot of time on Hacker News. From time to time I post something to HN. I was always wondering, how can I make my post to be on the front page? Obviously it must be interesting for others, so they will up vote it. But is it the only reason why? Let’s use “statistics” to find out.

I need the data!

To analyse something I need data. First thought I had — I scrape Hacker News page, but obviously that would be painful and will take time, and probably it is against HN terms and conditions. But hey, there is the HN API, which surely I could use. There is another BUT — API limits… I didn’t want to spend much time on this “project”.

Luckily I found this cool page — hckr news, by Wayne Larsen. It has loads of cool features, but at the time I was not looking for a great way to browse HN. I needed the data. So I started playing around with hckr news page and…

Scrolling down the page with dev tools gives me more info about where the data is coming from. Luckily for me the file comes from http://hckrnews.com/data/, so I get there and this is what I see:

Perfect! Exactly what I need. This is how each file looks like:

Each file has all of the stories which lands on the front page each day. And the data reaches back as far as 6th September 2010. So this is all I need! I have the data.

I have made the repo: https://github.com/knowbody/HNdump, where you can get all of this data, so you hckr news page won’t be hit too many times.

What I found out?

In my super hacky repo, you can find some of the code where I use mix of Python and R. Probably not the best R code, but it does the work. The reason I use Python is to prepare the data for R. R is great in what it does, but one of the down sides is, that R is very slow with a “for loops”.

I was curious what day of the week and what time is best to post on Hacker News. Here are the graphs:

y Axis is number of articles per day of the week, and x Axis day of the week.

And the second one, checking the time:

This is “zoomed” to the data which was interesting for me.
y Axis shows the up votes and the x Axis shows the time.
xAxis is a bit “wobbly”, you need to take away the starting ‘1’ in the number, so 1160000 becomes 16:00:00, which gives you the tim — it was just a hack I did in couple of hours, so forgive me.

These are just very basics and you can do much more using R. You could definitely combine those two graphs together. Or use different, better visualisation of the data.

What I have learnt?

I manage to learn R syntax and couple of its features. For people who start with R language I highly recommend this tutorials. And also R’s documentation.

I love science and I’m planning to do something more cool than this hack. I was thinking about using NASA or Twitter data. I might cover it in the future blog post.

Take away

I think main take away for you guys is the code and the data.
Here is the HN dump data: https://github.com/knowbody/HNdump
And the ugly code for whenkarma: https://github.com/knowbody/whenkarma

--

--