Understanding Popularity on Reddit

What makes an image popular on Reddit?

There are lots of theories about what makes something go viral; content should be emotionally compelling, it should appeal to a broad audience, etc,. Some people say they can’t define it but they know a great viral image when they see one. How accurate are these theories? Does our daily exposure to Reddit make us experts at predicting what posts will rise to internet fame? We have no idea but we built a game to find out.

Guess the Karma

Guess the Karma is a quick game that asks users to identify the more popular post in a pair of images that were submitted to the same subreddit. It looks like this…

You see two images and the subreddit they were posted in, and we ask you to identify which was more popular (received more votes). Then, we tell you whether you were right or wrong and show you the actual karma scores (upvotes minus downvotes) of each image before moving on to the next pair. In our example above, the picture on the left was more popular— Did you get it right? Try some more at GuessTheKarma.com!

Guess the Karma was launched in late October 2015 and attracted about 7600 people (2800 from Reddit, 2500 from digg.com, and the rest from various sources). Together, people voted on over 80,000 pairs of Reddit images from r/aww, r/pics, r/funny, and r/OldSchoolCool.

So how did everyone do?

Not good. The average accuracy was 52.5%; that is when presented with a pair of images from Reddit, our participants correctly identified the more popular image (as measured by votes) 52.5% percent of the time. For context, randomly guessing would have given an accuracy of 50%.

Of course this doesn’t mean that everyone got 5; some players got them all right and others got them all wrong. The distribution of player accuracy looks like a bell-curve around 50%:

Is Guess the Karma fair?

One objection is that lots of images have similar scores and there is no meaningful difference when popularity differs by a few votes. We completely agree, so we tried to minimize the number of pairs with small score differences (we still had to show some for research purposes). The flip-side of that logic is that when the difference between scores is large (e.g. image A has 1000 votes and image B has 10), then guessing accurately should be possible.

Does this hold-up? Yes and no. When the difference between scores is tiny, the accuracy rate is basically 50%,and as the difference in karma scores increases, people get more accurate. However the accuracy rate only increases to 59% at maximum — not a great improvement.

More concretely, we measure the difference in image popularity by the ratio of the bigger score to the smaller score. These ratios can vary from 1 (when the scores are similar, like 6 and 5) to approximately 1000 (when scores are 10,000 and 10). Looking at ratios allows us to see the effects of large vs small differences in popularity. The graph on the left shows how accuracy changes as the score ratio increases, while the graph on the right shows the distribution of score ratios in our experiment:

The points on the left plot represent the average accuracy for all pairs of images with a given ratio; this plot shows us that image-pairs with a ratio is near 1 have an accuracy slightly above 50%, while image pairs with a ratio near 10 have an accuracy of 53%, etc,. Putting these results in context, when people look at a pair of images where image A has a score around 10,000 (which equates to front page glory) and image B has a score of about 10 (an image that dies in r/new), only 57% of people would identify image A as the more popular image.

(The bars around each point show the associated confidence we have of the accuracy measurement. As the ratio gets larger, we have less instances of pairs with that ratio, so our uncertainty increases.)

Are people guessing randomly?

We were pretty surprised to learn that the accuracy rate was only between 55% and 60% even when the images had score ratios of 100 to 1000. Is this low accuracy rate due to Reddit itself or due to our participants?

One theory for the poor performance is that player’s didn’t care and put in a small amount of effort. That doesn’t seem like a good explanation for two reasons:

  1. People played this game for fun; a disinterested person should’ve just closed their browser and got on with their lives instead of clicking as fast as possible to complete it.
  2. We can measure the time it took someone to answer a question. We’ll call this a response time. The typical response time was between 5 and 15 seconds, so they are considering the images at least a little. Here’s the distribution of response times, separated by whether the player guessed correctly or not:

It looks pretty much the same, right? There’s little support for the theory that time and effort affected accuracy by. Maybe a few people were randomly guessing but most appeared to give it a good try. However more time did not equate to a higher accuracy.

So what now? Guess the Karma 2.0!

On high level, there are three possible causes for the low accuracy that we observed:

  • The prediction task is fundamentally difficult because popularity on Reddit is random.
  • Reddit popularity is not random, but the people we recruited were bad at this game.
  • The game had flaws in its design.

Or it could be all of the above.

We could do a bunch of statistical analysis to try to parse more meaning from our results but instead we decided to do some redesigns and run a second experiment. We have been working hard to redesign the game with 3 major improvements:

  1. We try to generate more intelligent sets of image-pairs. GTK 2.0 features more pairs with meaningful differences between the images.
  2. We’ve expanded the set of subreddits to include some that have a more specific focus than r/pics. (If you have suggestions for other SFW subreddits, let us know!)
  3. We’re also going to ask some of you which image you prefer as well as which you think was more popular.

Guess the Karma is a collaboration between Maria Glenski and Greg Stoddard and has no affiliation with Reddit. They are both PhD students in computer science. Greg has more karma on Reddit and he is very proud of it.

