Using GPT-3 to Code Open-Ended Survey Responses: Realities and Limitations

2 min readJan 11, 2023

Along with a professional software engineer, I attempted to code 4,000 open-ended responses using GPT-3, the AI engine behind ChatGPT.

To be clear, the task at hand is of a greater complexity than simply requesting ChatGPT to extract the key concepts from a given text, such as those found in transcripts of focus group discussions. Open-ended responses are coded at a respondent-level and the codes need to precisely and consistently reflect the responses. ChatGPT does not have the capability to load or analyze an Excel file.

As someone with no computer science background, my biggest takeaway is that it’s not a simple ‘plug-and-play’ solution. These are some of the lessons I learned:

Even though the API is relatively simple to use, integrating it into tools still requires some technical assembly and it is not as straightforward as one might expect. For example, creating a Google Sheets formula that can point to cells and use the value as prompts requires a bit of JavaScript programming knowledge.
An important thing to understand about GPT-3 is that it uses the training it has gained from general, Internet-available knowledge. This is a vast dataset, but it also means that GPT-3’s training likely does not include all of the information that is tightly relevant to your survey.
As a result, additional survey-context-sensitive information, such as examples of how to code responses or codes to categorize the responses (i.e. a code frame), must be included in every single API prompt call. Given that GPT-3 usage is based on the amount of information (categorized as tokens) in every call, this makes API calls much more expensive.
The cost of the API calls depends on the underlying “model” used. For coding 4,000 responses, we estimated the cheapest model, Ada, to cost around $900, while the most expensive model, DaVinci (which, coincidentally, is the one used by ChatGPT), would likely cost around $4,500. This is more expensive than a typical coding job!
The problem is, after some testing, we were only able to make the DaVinci ($$$) model code responses relatively well, but even there, the quality is questionable.

I really wanted GPT-3 to code my 4,000 open-ends in 2 minutes for free, but that’s not how the story ends! In conclusion, there is a lot of potential with GPT-3, but API calls need to be cheaper for the economics to make sense. In the meantime, humans or human/AI hybrid solutions are still the most cost-effective and reliable options for quality coding.

Using GPT-3 to Code Open-Ended Survey Responses: Realities and Limitations

Written by Karine Pepin