a word cloud

Visualization of customer feedback(reasons behind feedback) — Word Cloud

Rajesh Gudikoti
IBM Data Science in Practice
3 min readApr 22, 2019

--

Problem Statement: We collect feedback from customers and do analysis of it, and post analysis, we conclude by saying, we received x% positive feedback, y% as negative feedback, etc.

What we wish to understand are the reasons behind positive and any other feedback.

This article focuses on the visualization of feedback for customers giving negative reviews.

On technology front, I will be using the “wordcloud” Python library and using airline industry dataset from Kaggle.

For airline industry use case, the common feedback are flight delay, baggage issue, hospitality etc.

For one airline provider, delay may be the major issue and for other it can be baggage issue. Here, we try to represent the reasons in the form of word cloud.

Sample Data shown in image below

screenshot of a Jupyter notebook looking at the airline dataset

As usual, data is pre-processed before applying analytics. I will skip data pre-processing details and focus on word cloud generation.

To create word cloud in Python, we need few lines of code:

from wordcloud import WordCloudwc = WordCloud(stopwords=stop_words, background_color="white", colormap="Dark2",max_font_size=150, random_state=42)wc.generate(data_nouns_adj.reviews[c])

I have picked up the words which represent nouns and adjectives expressed in customer reviews. Let us generate word cloud using such words as shown below.

import matplotlib.pyplot as pltplt.rcParams[‘figure.figsize’] = [16, 6]airline_name = [‘virginamerica’, ‘united’, ‘southwest’, ‘delta’, ‘usairways’]#print(data.columns) # *** from dtm pickle file
#print(data_clean)
# Create subplots for each comedian
for index, c in enumerate(data.columns):
#print(c)
wc.generate(data_nouns_adj.reviews[c])

plt.subplot(3, 4, index+1)
plt.imshow(wc, interpolation=”bilinear”)
plt.axis(“off”)
plt.title(airline_name[index])

plt.show()
set of word clouds from each airline from the dataset

It did not give right sense as required for my use case. I wanted to understand the main pain points shared by customers.

To find the next trail, I picked the verbs instead and generated a word cloud. This gives me better insights of customer reviews. As shown in below image, for few airlines, cancellation of flight is the main pain point, for few airlines, delay is the paint point.

(We can club delayed and waiting keywords as both words that tend to mean same).


import matplotlib.pyplot as plt
plt.rcParams[‘figure.figsize’] = [16, 7]airline_name = [‘virginamerica’, ‘united’, ‘southwest’, ‘delta’, ‘usairways’]print(data.columns) # *** from dtm pickle file
print(data_clean)
# Create subplots for each comedian
for index, c in enumerate(data.columns):
print(c)
wc.generate(data_verb.reviews[c])

plt.subplot(3, 4, index+1)
plt.imshow(wc, interpolation=”bilinear”)
plt.axis(“off”)
plt.title(airline_name[index])

plt.show()
set of generated word clouds per airline in the dataset

The word cloud is a good visualization technique to understand text data. In this technique the size of each word indicates its frequency and significance.

Business Use Case for Word Cloud

1. Finding customer pain points — and opportunities to connect

2. Understanding how your employees feel about your company

3. Identifying new SEO terms to target

The more details can be found below

https://www.boostlabs.com/what-are-word-clouds-value-simple-visualizations/

IBM Code Content for Developers

My other article

sms analysis to extract offers given by merchandise

--

--

Rajesh Gudikoti
IBM Data Science in Practice

I have spent 2 decades in software industry working on Java, J2EE, BPM and AI/NLP/ML. I enjoy writing articles especially on NLP.