Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes

Megh Marathe
Human-Centered Computing Across Borders
2 min readMay 18, 2018

Qualitative researchers rely on methods such as interviews, field observation, and document analysis for data collection, which produce large amounts of textual data in the form of interview transcripts, field notes, and organizational reports. A single hour-long interview, for example, results in over twelve pages of transcribed text.

Researchers begin data analysis by performing an important and painstaking annotation process known as coding**, where short labels or codes are assigned to chunks of text, to indicate something about their content. Researchers cherish the process of coding because it helps them surface significant patterns and build a nuanced understanding of their data.

However, coding is both time and effort intensive, requiring several hours of highly-skilled researcher attention per interview; and becoming prohibitive for large datasets. Prior work in human-computer interaction (HCI) and natural language processing (NLP) has sought to help by enabling new forms of analysis, but without focusing on researchers’ existing coding practices and needs.

Therefore, our paper asks:

1) How do qualitative researchers code? Understanding the status quo helps gain insight into users’ existing practices, needs, and desires.

2) Could the process of coding be partially automated, and should it be? If yes, how far do the simplest NLP techniques take us?

We find that across disciplines, qualitative researchers follow several practices well-suited to automation. Indeed, researchers desire automation, but only after having coded a subset of their data, in extending their coding to unseen data. Additionally, researchers want any assistive tool to be transparent about its recommendations.

Based on our findings, we built prototypes to partially automate coding while maintaining researcher agency. Our top-performing system uses simple NLP techniques to generate coding that performs as well as human coders on inter-rater reliability measures.

So, should qualitative coding be partially automated? Definitely maybe.

** We do not mean coding as in programming or writing software. Qualitative coding, originating in the humanities and social sciences, is a process of data classification that is roughly similar to the UX research technique of affinity diagramming.

Megh Marathe and Kentaro Toyama. 2018. Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ‘18). ACM.

--

--

Megh Marathe
Human-Centered Computing Across Borders

U. Michigan PhD student who studies how epileptic seizures are experienced by people and diagnosed by neurologists. Disability studies, STS, HCI, data science.