Analysing 4 million sums done on Speedsums

Speedsums is a site that tests your mental arithmetic — you get 30 seconds to do as many (simple) maths problems as you can, after which you’re given a score. I encourage you to check it out if you haven’t already: www.speedsums.com. A milestone was recently reached — a total of 4 million sums have been done on Speedsums! To celebrate, I decided to t̶h̶r̶o̶w̶ ̶a̶ ̶p̶a̶r̶t̶y̶ do some data analysis.

Here’s a graph of the distribution of scores:

Sample size: 220k. As expected, the scores follow a normal distribution quite closely.

Interesting thing to note: there are unusually few instances of people getting a score of 7. I really have no idea why this is — I’m fairly certain there’s no systematic error in the way that scores are stored in the database. Let me know if you have any thoughts.

I thought it would also be interesting to see how different countries d0:

The table above only shows countries which have done over 10,000 tests on Speedsums. Here’s a graph showing score distributions in a bit more detail, for countries that have done at least 1,000 tests each (greater number of points corresponds to greater number of tests done):

Black dots represent average scores.

Things to note:

  1. Anomaly at 7— we can see the 7 anomaly back in action here by the horizontal line with very few points inside it near the bottom.
  2. Glass ceiling at 46 — the US (US), Poland (PL), Britain (GB) (and others) seem to have this invisible barrier at score 46, where there’s a cluster of points with very few above it.
  3. Cluster at 60 — Great Britain (GB) has a cluster of points with score 60.
  4. Croatia (HR) — Croatia’s average score of 32.9 is almost 10 points higher than the next highest scoring country (Japan, JP).

Explanations

  1. Again, I have no idea what’s going on here.
  2. Since it’s very easy to ‘cheat’ on Speedsums by writing scripts to automate the process, I put some basic anti-cheat systems in place in the first version of the site. One of these was to assume that any score greater than 46 was done by a bot and reject it. The highest score I was getting in those days was about 40, and I liked to think I was pretty good at things like this, so surely no one could get higher than 46, right? Anyway, people eventually discovered the 46 limit, and wrote their bots to get scores of 46, hence the cluster.
  3. A few months, and many misspent evenings later, I managed to get a score of 46 and was suddenly open to the possibility of people getting scores greater than 46, so I increased the maximum score to 60. A few people discovered this too, so they wrote their bots to get scores of 60, hence the cluster at 60.
  4. This is a bit more interesting. Read on.

How do you solve a problem like Croatia?

My initial reaction after seeing the chart was that someone in Croatia wrote a bot that repeatedly got, say, 42, and ran it hundreds or thousands of times. The sample size for Croatia is about 5000 so if we assume Croatia actually had the global average (~17), then about 3000 runs of a bot that always scores 42 would be enough to skew the Croatia average up to 33. Looking deeper into the data, however, has led me away from the bot theory.

I wanted to see whether the data recorded from Croatia followed the same patterns as other data, so I decided it would be useful to try and identify the individual(s) responsible for the high scores in Croatia. Since Speedsums doesn’t require you to sign up or log in, the only thing that identifies visitors is their IP address. IP addresses often change and don’t uniquely identify people, so I instead analysed ‘sessions’ — groups of tests done on Speedsums by the same IP address with less than 5 minutes between each test (often much shorter). Workplaces and universities often have a shared IP amongst many users so it could be problematic if, say, two friends at the same university are Speedsumming at the same time. Luckily for us, it’s pretty obvious when there’s more than one person in a ‘session’:

On the left is the graph of a typical session in Croatia. On the right, however, we see much greater variance in scores about 30 minutes into the session, suggesting that there are actually 2 people of different Speedsums abilities participating in the session.

Here’s a side-by-side comparison of a typical 30-minute session in the US (left) vs a typical 30-minute session in Croatia (right). Both have similar variance and patterns of fluctuation, suggesting that the tests done in Croatia are indeed done by humans. Of course, there is still the possibility of them being done by a very well designed bot that mimics humans, but anyone who’d want to make something like that would probably be able to think of much more interesting uses for it than to cheat anonymously on Speedsums.

All in all, I think the cause of Croatia’s high average score is probably down to sampling bias. I haven’t got any data on how Croatian users discovered Speedsums, but given the Croatian sample size is only ~5000 tests, it’s quite possible that the site just spread among a group of people in Croatia who are really good at arithmetic.


I hope you found this at least mildly interesting. I’d love to hear your thoughts on the data or Speedsums in general — @TaimurAbdaal. Also, check out some of the other things I’ve made at www.taimur.me.