Evaluating brand awareness surveys in minutes instead of hours

This article will walk you through the process of automatically analyzing brand awareness like questions, using codit.co.

Say you have published a survey on Google forms (or even better, using one of the beautiful questionnaires from Typeform), asking participants a simple question:

What bank(s) would you consider for your personal banking needs?

As result, you are interested in the counts of how often specific institutes were mentioned — something similar to this chart.

The result we are looking for. (Data is artificially generated and does not represent any consumer opinions)

You would think this should be a no-brainer, right? Well, not so fast. The responses now start to trickle in, and they make you sigh:

  • Credit Suissse
  • UBS/Merrill Lynch
  • Raiffeisen Merryll Lynch

There are two obvious problems:

  1. Typos are abundant (you’d be baffled to see how many variants there are of misspelling “Merrill Lynch”)
  2. Although you have provided 5 fields into which people could type the brand names, many just used the first one, into which they entered multiple brands. For separating multiple brands, some used commas, some dashes. Others didn’t add any delimiter at all.

You ponder on how to proceed. 
Either, you go through every single response and fill out a tally sheet. This traditional approach would definitely work, but given that hundreds or even thousands of people participated in your super-intriguing survey, it would probably take you a day’s worth of effort to get this done. 
Alternatively, you could try using a standard survey analysis or NLP software, in which you would have to define rules by hand on which kinds of spellings belong together.
Finally, you could fire up Python or R and try to figure out a way of doing this at least semi-automatically. But everyone who has ever done any real data science work knows, this is no easy task and there is no guarantee it will work, leave alone being applicable to future datasets.
None of these options looks very attractive.


So, how is it done instead? Statistics and machine learning to the rescue

At codit.co, we have developed a sophisticated method, to analyze questions like this [also known as list- or semi-open questions] with only a few clicks. Our approach heavily relies on statistics but also makes use of the combined knowledge present in all surveys evaluated on the platform.

1. Upload your data

You can download our example dataset here.
(Note: All data is artificially generated and does not represent any real consumer opinions)
After having created an account on codit.co, click Import 
➡ Select the file you downloaded above, click Next
➡ Choose List question, click Next
➡ Name the project Demo: Bank awareness, hit Save

2. Run the Auto-Coder

Open the project you just created. A wizard guiding you through the process will pop up when you open the project the first time, hit Next.

The wizard guiding you through automatic categorization. It succeeded in both in discovering typos as well as separating multiple mentions from the same row.

You can see, that the software was able to group various different spelling variants into the correct categories. It is even smart enough to notice, that “Bank of America” is an entity consisting of multiple words and should be kept together, while “Bank Raiffeisen” (which does occur in the demo file a couple of times) is not a term which makes sense.

You can play around with the Minimum count and Similarity levers, but the default settings already provide great results. There are only a few minor things we want to correct, but we will do that in the next step. In this view, it makes sense to get the categories rather too fine-grained, than too coarse. The reason is simple: Merging and deleting, later on, is easy, separating is hard.

We don’t actually need to do anything else than hitting Save. Proceed to reading the further instructions and press close.

3. Refine results

We’re almost done, only a few details require tweaking. 
When checking the resulting codes, we see that “USB” and “UBS” are separate. To merge them, select both rows and click the Layers icon on the top right (see the screenshot below).

Merging multiple codes into one is simple.

Then, select Merrill Lynch and Bank of America, click the edit button and change their category to US Institutes. Do the equivalent for the Swiss Institutes. Finally, remove the Other code, as it is not used here and sort the codes by their category by clicking on the Category header.

Et voilà! You’re done. Your resulting graph (which you find in the Code Frequencies tab) should now look exactly like the one presented at the beginning of this post. You can of course also download the detailed results as Excel, SPSS, CSV or JSON files.

The detailed, response-level code assignments can be downloaded as Excel, CSV, JSON or SPSS files.

What we’re most proud of: The entire process took less than 20 clicks and no more than 5 minutes to complete and is of the same or even better quality, compared to doing everything by hand.


Have remarks or feedback? Write us at support (at) codit.co

You can also use codit.co to speed up coding of your NPS, employee feedback or brand perception surveys.
To learn more about the platform visit our website or book a quick demo.