Flickr Credit: Jenni Waterloo

Organizing data files and analyses

Andy DeSoto
Human Behavior and Technology
3 min readApr 16, 2013

--

As I’ve progressed as a behavioral researcher, many things have become increasingly simple: conducting literature searches, organizing PDFs, writing up empirical papers. One thing, however, has remained a thorn in my side: Organizing and keeping track of data files (e.g., .xlsx documents) and analyses (e.g., SPSS outputs) — despite careful attention to documenting my work. If you’re anything like me, you run an analysis on a dataset, save the results somewhere, and a month down the road forget what you did and where the results went.

I’m not ready to write myself off as an incompetent, though. The real problem, I think, is that there aren’t great tools to use, at least in the cognitive psychology domain, to keep track of and organize scientific analyses. SPSS, a statistical package I use frequently, provides an example. Working with SPSS creates (at least) two separate files that need to be saved: (1) the actual data file (think an Excel spreadsheet), and (2) the output file, where results are displayed. This can get out of hand quickly, especially when a number of analyses need to be run. To make things worse, saving a data file even modifies (and will even re-open) an output file, creating an endless cycle of opening, saving, and closing windows. What a mess!

So given that there aren’t great tools out there for data organization, what can be done? Unfortunately, it has to come down to conventions within a user’s own filesystem, which can be difficult to keep consistent, especially over lengths of time. Regardless, here are some suggestions:

  1. Use specific naming conventions and/or label colors to denote “master” data files. These should never be modified or changed. They’re like film negatives — originals.
  2. Also denote data files that have gone into published manuscripts. These can’t be changed, either — or at least shouldn’t be — once an article goes to print.
  3. Organize your analyses by date. At least you’ll know that your thinking was slightly more advanced on April 15, 2013 than it was with the same dataset on April 15, 2012 — or at least hope it was.
  4. In your worknotes, make references to specific files and where they’re stored. This associates natural language text and descriptions to something that may amount to just a file full of numbers.
  5. Take time once in a while (monthly? yearly?) to clean up your data, putting old files in archives where possible and marking the changes in your worknotes.

Can you think of more? Really, in my opinion, the more compulsive you are about organizing, labeling, and keeping track of your work, the more time you’ll save (and fewer errors you’ll make). Think of your computer’s hard drive as a garage. You’d never throw things randomly into boxes and leave them unlabeled and out of reach, at least in theory. Don’t do the same with your data.

Did you enjoy this article? If so, follow me on Twitter!

--

--

Andy DeSoto
Human Behavior and Technology

I'm a cognitive psychologist. I write about behavioral science, technology, local business, and baseball. All views are my own.