Tool: Python Script for 1st Level Coding of Semi-Structured Interviews

Shruti Grover
Published in
4 min readMay 10, 2016


The Outcome

I recently sat down to analyse a set of interviews I conducted with 11 participants for a usability study for Project Balance. For every participant there was a ‘before’ and ‘after’ interview leading to a total of 300 pages of transcripts. Working out of a small studio and minimal wall space led me to explore a tech alternative to post-it walls with Simon B Johnson.

Post-it wall made with Ross Atkin

The end result is a simple python script for level 1 coding which can be run on multiple transcripts simultaneously and outputs sorted quotes to a csv file. This method allowed me to consider the complete transcript of the interviews with ease. I conducted second level coding the traditional way, synthesising the outputs using post-its and tagging relevant quotes using unique id row numbers. I probably cut my first level coding time by over a half. This method worked especially well since I was comparing 2 studies with a ‘before’ and ‘after’ phase.

The workflow overview was as follows:


Step 1 : Tagging Transcripts

Quote were tagged directly in a transcript with multiple codes. To code I used the following format:

<q code1 code2 code3> This is something interesting </q>

To make coding quicker, I created a table of all the codes and assigned shorthand to them (for example a code called ‘wobble reduction’ becomes ‘wr’). I then added these shorthands to my word dictionary so that Microsoft word would substitute the full form of the code. All the analysed files were exported as .txt files with unicode-8 formatting, this made running them easier.

Step 2 : Running the Script

Once the files were ready, I added all the names of the files I wanted to run to the python script.

Step 2: Add file names to the script

On running the script, I could see which of the files were being processed and list of my codes. This allowed me to address mistakes if any (for example mis-tagging as ‘values’ instead of ‘value’ would lead to an extra column on the csv). I would then go back and correct the transcript file.

The Terminal readout

Step 3 : Sorting and Filtering

The resultant csv file contained all the quotes in the first column. Subsequent columns represented individual tags and are marked “T” or “F”. To make the file readable I:

a) Wrapped all text

b) Added conditional formatting (if equal=T, then color green) to colour code the data, this allowed me to spot patterns at a glance (example: Participant A spoke more about feature x vs feature y).

c) Enabled the sort and filter function to focus on a particular participant or on a particular tag.

I followed up with a post it wall for second level coding!

Plans for Future

I can definitely see myself visiting my output file in the future since it acts as a low footprint insight bank. The next step for me might be to work on a easy to use web interface. Having used Dedoose as well as Saturate, I felt that this was a more agile way for me to work, allowing me to build on my coding hierarchy with every transcript.

The github code is here.

If anyone wants to know more here is my twitter : shrutigr;

A very big thank you to Simon B Johnson for help on the technical bits of the python script.



Shruti Grover

People — Impact — Design / Co-founder at, Human Centered Designer, Researcher