Tool: Python Script for 1st Level Coding of Semi-Structured Interviews

Shruti Grover
May 10, 2016 · 4 min read
The Outcome

I recently sat down to analyse a set of interviews I conducted with 11 participants for a usability study for Project Balance. For every participant there was a ‘before’ and ‘after’ interview leading to a total of 300 pages of transcripts. Working out of a small studio and minimal wall space led me to explore a tech alternative to post-it walls with Simon B Johnson.

Post-it wall made with Ross Atkin

The end result is a simple python script for level 1 coding which can be run on multiple transcripts simultaneously and outputs sorted quotes to a csv file. This method allowed me to consider the complete transcript of the interviews with ease. I conducted second level coding the traditional way, synthesising the outputs using post-its and tagging relevant quotes using unique id row numbers. I probably cut my first level coding time by over a half. This method worked especially well since I was comparing 2 studies with a ‘before’ and ‘after’ phase.

The workflow overview was as follows:


Step 1 : Tagging Transcripts

Quote were tagged directly in a transcript with multiple codes. To code I used the following format:

<q code1 code2 code3> This is something interesting </q>

Step 2 : Running the Script

Once the files were ready, I added all the names of the files I wanted to run to the python script.

Step 2: Add file names to the script

On running the script, I could see which of the files were being processed and list of my codes. This allowed me to address mistakes if any (for example mis-tagging as ‘values’ instead of ‘value’ would lead to an extra column on the csv). I would then go back and correct the transcript file.

The Terminal readout

Step 3 : Sorting and Filtering

The resultant csv file contained all the quotes in the first column. Subsequent columns represented individual tags and are marked “T” or “F”. To make the file readable I:

a) Wrapped all text

b) Added conditional formatting (if equal=T, then color green) to colour code the data, this allowed me to spot patterns at a glance (example: Participant A spoke more about feature x vs feature y).

c) Enabled the sort and filter function to focus on a particular participant or on a particular tag.

I followed up with a post it wall for second level coding!

Plans for Future

I can definitely see myself visiting my output file in the future since it acts as a low footprint insight bank. The next step for me might be to work on a easy to use web interface. Having used Dedoose as well as Saturate, I felt that this was a more agile way for me to work, allowing me to build on my coding hierarchy with every transcript.

The github code is here.

If anyone wants to know more here is my twitter : shrutigr;

A very big thank you to Simon B Johnson for help on the technical bits of the python script.


Stories from Hetco

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store