The Hidden Benefits of Open Data

Open data is great! Open data can allow for better meta-analyses and follow-up studies, and can provide great opportunities for improving methods (which I’ve been doing quite a bit of lately). Something that really set in recently though, is that open data can be an amazing resource for teaching.

When writing my textbook, I wanted to use real data from published papers in the examples, with the goal being to re-make figures from those papers as the exercises in the book. (More about how I ended up writing a textbook in a later post.) However, I did most of the dataset hunting back in 2011 and 2012, and I was hard pressed to find real data to use. I made it work and found data from a few different subfields of psychology, but it was much more difficult than I would’ve liked. Later on I gave a few workshops and found some better datasets, but things were still difficult. Open data has become much more prominent in the last few years though, at least compared to how things were then. More recently, at Neurohackweek we had a great tutorial on MRI image processing, using open data from the Human Connectome Project [link].

The big realization for me was coming across @openstatslab on Twitter. I think this initiative is amazing.

The basic principle is simple, “[using] open data sets from articles published in Psychological Science to help teach introductory statistics.” Kevin McIntyre, the creator of Open Stats Lab, developed activities to go with papers that were published with accompanying open datasets. Activities are categorized by statistical method (e.g., regression vs. ANOVA) and are outlined as discrete learning objectives. The original paper and dataset are linked alongside the activity to make things simple, and teaching intro stats for psychology couldn’t be easier!

…and if all that isn’t enough, showing open science early on could help shift the field towards open science.

It’s great to be able to have these activities available rather than just being stuck with either the textbook resources or delving into the internet to scavenge for open data (like I had to do in the past). Hopefully this post makes more people aware of Open Stats Lab, so more academics can take advantage of this fantastic resource!