All quiet on the Stack Overflow front
A developer story
A few years ago, when working at PullReview, I started looking for some questions I could answer on StackOverflow for two different reasons: to start “giving back” to a community that has been helping me since 2008 (anyone remember how it was before?) and also to get some recognition as a Ruby developer (shiny, shiny status).
In other words: playing the StackOverflow game, beautifully created to reward good answers with shiny status points.
Big Queries
Very recently I started having a look again (not at StackOverflow, that never stopped, but at questions to be answered) and I found it much more difficult to find good questions to answer. I did not thought a lot about it, until a few days ago when I tried to confront my feeling with some hard numbers.
Enter Google Big Query Stack Overflow Data Set. For those that are new to it, it’s a several years of Stack Overflow data accessible through SQL in Google Big Query Console — what looked like the perfect tools to confirm my feelings.
For the eager, my query & results are available here. I’ll detail the results below, but if you find anything incorrect about the query, please point out to me so I can fix it (and then go hide in shame somewhere).
Mostly, I tried to see how question & answer scores have been evolving through the years available (2009 to 2016), and the results are quite interesting.
A growing knowledge base
This is what I expected — during those 8 years, the number of questions on Stack Overflow has increased dramatically — from around 350,000 in 2009 to 2.4 millions in 2016! So, it’s not like I should be lacking questions to answer, but let’s look a bit further.
The good, the bad and the ugly (questions)
My next step was to get a look at how good questions are. In Stack Overflow terms, that is the score of the question (we’ll talk about answers in a minute). The evolution is no less impressive:
In 2009 the average question has a score of 9. It decreased to 0.57 last year. Think about it: the average question asked in 2016 had a score of 1 or less.
So either most new questions are really bad or the community (which also votes on questions) became much more difficult to please. The next chart sheds some lights on that by looking at the percentage of questions that actually a score of zero or less (meaning as many downvotes than upvotes or worse, so bad quality questions deemed unfit for the site):
We have evolved from around 27% of “bad questions” to a staggering 64% in 2016 — a majority of the new questions asked last year are “bad questions” according to the community.
As average are always a bit misleading, here is the distribution of the scores in 2009 and 2016 (normalized):


In 2009, the mode was the 1–5 range, and questions with 30+ points represented 5% of the total. In 2016, the mode has shifted toward the 0 category and the 30+ points questions represent just above 0.05% (an hundred times less).
This may explain a bit about why I’ve trouble finding questions to answer those times… but that’s just half of the story — we need to look at answers now.
Philosophy: Unintelligible answers to insoluble problems.
- Henry Adams
Maybe he was talking about programming? So let’s have first a look at how many of those questions actually get an accepted answer:
A sort of mirror of previous chart, showing a large decrease in the percentage of questions getting an accepted answer, from 73% in 2009 to only 43% in 2016. This looks quite logical: if the quality of the questions decrease there is less chance to get valid anwers, and it makes sense that most of the questions with a negative score never get answered at all.
In order to get a better view, let’s just look at the questions that actually get an accepted answer. In Stack Overflow, you get points for getting your answer accepted (which depends on the person owning the question), but mainly based on the score of your answer (which itself depends on the whole community as anyone can upvote/downvote your answer):
So in 2009 an accepted answer did in average get a score of 14. In 2016 we’re at… 1.66. So, like just the person asking and one other — on a community that became much larger.
In other words: not only is it more difficult to find good questions, getting the accepted answer for one will only lead you so far — you’ll probably not get much more than the accepted bonus (25 points) and 16 points for the 1.6 upvotes.
Looks like the Stack Overflow currency (your score) became much more difficult to acquire. To give you an idea I have a score of around 5,000 points, putting me in the “top 7% overall” according to the platform. To get this score in 2009, you would need to successfully answer 5,000 / (25 + 140) = 30 questions. In 2016, you would need more than 120 — and remember, there may be more questions, but most of them are not deemed worthy to answer at all!
So, looks like the Stack Overflow game has become much more difficult. Winning points became harder and harder, so there was a “first mover advantage” for the early community that may never be possible to the late comers to overcome.
Some last words
While I found the exercise really interesting, this does not say anything about the usefulness of the platform. I search and find a lot of answers on Stack Overflow — as many developers I get a lot of value out of it, without ever having to ask a question myself. I’m not alone there, out of Stack Overlow 7.25M users, only 2.5M have ever asked a question!
It may actually be a sign of maturity: most of the interesting questions about each topic have been asked and answered already (even if in an ever evolving world such as technology new topics will continue to appear). My guess at a viable strategy at the Stack Overflow game would be to prioritize new topics — this requires of course to stay on the edge for at least some of them.
So, thanks Joel & Jeff & my best wishes to Stack Overflow for the next 10 years!
