Lessons learned from putting an erroneous visualization on Reddit’s front page
So, I spent Sunday in the long-waited for sunshine playing around with some visualization. I set out to replicate spatial.ly’s Population Lines using the conveniently available GEOSTAT data set, that contains the population density by square-km for (many parts of) Europe. A little while later, I had a nice little script working, put a quick submission onto Reddit, and closed the computer to get on with my coffee.
When I checked the comments a little later, it became obvious that I had gotten it slightly wrong. In particular, the southern part of Spain was messed up (which a commenter pointed out), and I had only used part of the full data set (there was modeled as opposed to counted data in another file).
Though these errors were not sufficiently wrong to keep the post off the front-page, it turned out. And since I had submitted the viz through Reddit’s image hosting, there was no way for me to replace it. And, damn, it is frustrating to see an erroneous version of your viz floating to the front page and around the web without being able to replace it.
So, a few lessons-learned for next time.
- Keep a pad of paper and make a list of any and all little doubts or questions that happen throughout the process, and explicitly cross each item off at some point. I simply was sucked into tweaking the chart with eyes on my home region and let go of the oh-so-important bug checking for long enough that I forgot about it.
- Measure twice, submit once. A quick Google search for heat maps of the population density quickly would have made the errors obvious.
- If you at all can, deploy the viz in a way that lets you update it. Using imgur or Reddit’s image hosting does not allow updates, for other good but here inconvenient reasons.
So all in all, yesterday’s frustration paid off with a set of well ingrained and worthwhile lessons, as well as a piece of code that IMO is quite elegant and generalizable.