Evaluating brand awareness surveys in minutes instead of hours
This article will walk you through the process of automatically analyzing brand awareness like questions, using codit.co.
Say you have published a survey on Google forms (or even better, using one of the beautiful questionnaires from Typeform), asking participants a simple question:
What bank(s) would you consider for your personal banking needs?
As result, you are interested in the counts of how often specific institutes were mentioned — something similar to this chart.
You would think this should be a no-brainer, right? Well, not so fast. The responses now start to trickle in, and they make you sigh:
- Credit Suissse
- UBS/Merrill Lynch
- Raiffeisen Merryll Lynch
There are two obvious problems:
- Typos are abundant (you’d be baffled to see how many variants there are of misspelling “Merrill Lynch”)
- Although you have provided 5 fields into which people could type the brand names, many just used the first one, into which they entered multiple brands. For separating multiple brands, some used commas, some dashes. Others didn’t add any delimiter at all.
You ponder on how to proceed.
Either, you go through every single response and fill out a tally sheet. This traditional approach would definitely work, but given that hundreds or even thousands of people participated in your super-intriguing survey, it would probably take you a day’s worth of effort to get this done.
Alternatively, you could try using a standard survey analysis or NLP software, in which you would have to define rules by hand on which kinds of spellings belong together.
Finally, you could fire up Python or R and try to figure out a way of doing this at least semi-automatically. But everyone who has ever done any real data science work knows, this is no easy task and there is no guarantee it will work, leave alone being applicable to future datasets.
None of these options looks very attractive.
So, how is it done instead? Statistics and machine learning to the rescue
At codit.co, we have developed a sophisticated method, to analyze questions like this [also known as list- or semi-open questions] with only a few clicks. Our approach heavily relies on statistics but also makes use of the combined knowledge present in all surveys evaluated on the platform.
1. Upload your data
You can download our example dataset here.
(Note: All data is artificially generated and does not represent any real consumer opinions)
After having created an account on codit.co, click Import
➡ Select the file you downloaded above, click Next
➡ Choose List question, click Next
➡ Name the project Demo: Bank awareness, hit Save
2. Run the Auto-Coder
Open the project you just created. A wizard guiding you through the process will pop up when you open the project the first time, hit Next.
You can see, that the software was able to group various different spelling variants into the correct categories. It is even smart enough to notice, that “Bank of America” is an entity consisting of multiple words and should be kept together, while “Bank Raiffeisen” (which does occur in the demo file a couple of times) is not a term which makes sense.
You can play around with the Minimum count and Similarity levers, but the default settings already provide great results. There are only a few minor things we want to correct, but we will do that in the next step. In this view, it makes sense to get the categories rather too fine-grained, than too coarse. The reason is simple: Merging and deleting, later on, is easy, separating is hard.
We don’t actually need to do anything else than hitting Save. Proceed to reading the further instructions and press close.
3. Refine results
We’re almost done, only a few details require tweaking.
When checking the resulting codes, we see that “USB” and “UBS” are separate. To merge them, select both rows and click the Layers icon on the top right (see the screenshot below).
Then, select Merrill Lynch and Bank of America, click the edit button and change their category to US Institutes. Do the equivalent for the Swiss Institutes. Finally, remove the Other code, as it is not used here and sort the codes by their category by clicking on the Category header.
Et voilà! You’re done. Your resulting graph (which you find in the Code Frequencies tab) should now look exactly like the one presented at the beginning of this post. You can of course also download the detailed results as Excel, SPSS, CSV or JSON files.
What we’re most proud of: The entire process took less than 20 clicks and no more than 5 minutes to complete and is of the same or even better quality, compared to doing everything by hand.