NLP visualizations for clear, immediate insights into text data and outputs

Using Plotly Express and Dash to explore data and present outputs in natural language processing (NLP) projects.

JP Hwang
Plotly
7 min readMar 30, 2020

--

Samples of NLP visualizations

Extracting information from text remains a difficult, yet important challenge in the era of big data. Whether it comes to customer feedback, social media posts, or the news, the sheer volume of data to be analyzed can overwhelm information to be extracted.

This is where modern natural language processing (NLP) tools come in. They can capture prevailing moods about a particular topic or product (sentiment analysis), identify key topics from texts (summarization/classification), or amazingly even answer context-dependent questions (like Siri or Google Assistant). Their development has provided access to consistent, powerful, and scalable text analysis tools for individuals and organizations.

Still, aspects unique to languages can make it difficult to explore data for NLP or communicate result outputs. For instance, metrics that are applicable in the numerical domain may not be available for NLP. (E.g. what would be a mean, or a standard deviation of a set of word tokens?) Even if they could be calculated, presenting the data to audiences can be challenging.

Data visualization can help with this, of course, but it can be time-consuming to learn a particular package. Building a web dashboard can be even more challenging—often requiring languages unfamiliar to NLP practitioners such as CSS, HTML, and JavaScript.

So, in this article, we wanted to share with you ways that Plotly Express and Dash can ease some of this pain.

Plotly Express and Dash were designed with code readability and succinctness as priorities, to enable easy creation of high-quality local (Plotly Express) and web dashboard (Dash) visualizations. In other words, they aim to have data visualization support your work, not have it become a new headache.

With that said, let’s get into it! We use a consumer complaints database corpus for this example, but the concepts and visualizations we discuss should be universally applicable.

The code is available on this GitHub repository, and a deployed version of the app. Please feel free to follow along with this article, clone it, and make improvements!

(All analysis and notes here are for demonstration purposes only.)

Local visualizations

Data exploration

Our dataset contains over 18,000 rows and three columns. While this isn’t large by modern standards, it’s not really possible to ‘eyeball’ this raw data.

Let’s explore this dataset with Plotly Express, starting with the distribution of complaint counts by their date (to see trend over time):

Histogram of complaint counts by date

Now we’ll plot a histogram for the 20 companies with the most complaints:

Histogram of complaint counts by target company

Or by narrative length:

Histogram of complaint counts by narrative length

You may have noticed the succinctness of our code. Analysis by multiple variables, or changing to a log scale is also a cinch — just pass additional parameters as shown below:

Histogram of complaint counts by date (x-axis) and company (color)

Even better, these Plotly charts integrate seamlessly into Dash for dashboard generation as you will see later.

Now that we have looked at the distributions, let’s move on to review the text data in substance, starting with n-grams.

Visualizing n-grams

N-grams are simply sequences of tokens (words), and have many practical applications as well as being a great exploratory method. As single words can only tell us so much, let’s move straight to plotting counts of top bigrams.

Counts of top bigrams

Isn’t that neat? Most of these bigrams appear to indicate sensible groups of complaint types, and the counts show the volume of each group (credit report and credit card related complaints appear to be most common).

To drill down further into this data, a hierarchical visualization, such as a treemap, could be used. This example below divides the data by company and then whether the phrase ‘credit report’ is included. Box sizes indicate group sizing, and color indicates average narrative length.

Treemap showing the total share of complaints, portion mentioning credit reports, and average lengths

Notice that the visualization immediately reveals length-related patterns. Credit report related complaints tend to be longer, and a couple of companies’ complaints also stand out generally.

In some cases, you may wish to compare proportions of complaint bigrams for each company, in which case a stacked bar might be useful:

Stacked bar chart showing complaint proportion by bigram

Companies with higher volumes of credit card complaints pop out to the eye, as does one with a high student loan-related complaint.

For a closer review, we may even compare two companies directly, as done here for top 50 bigrams:

Bigram comparisons for two companies

This enables an easy comparison of two datasets by subject matter.

Qualitative comparisons

While we don’t have time to get into the technical weeds, very broadly speaking, word embeddings (dense embeddings to be precise) enable qualitative comparisons of words. They can represent words, and, by extension, concepts or documents as high dimensional vectors, which also provide opportunities for interesting visualizations. Take a look at this simple representation of bigrams using a bubble chart:

Displaying bigram concepts in a bubble chart

Here, high-dimensional bigrams are represented as two-dimensional representations using a dimensionality reduction technique called t-SNE.

Similar charts could be produced for any subset to compare text similarities and insights — say, for each company, or by length.

This might be a good opportunity to highlight that each of these charts were created in just a few lines of code using Plotly Express. Not only that, although you see static screenshots here, Plotly will generate interactive charts in your browser or notebook. Crucially, they can easily be incorporated into a live dashboard with Dash.

NLP dashboards made easy with Dash

The value proposition of Dash is similar to, and intertwined with, those that made Python the leading language for NLP. It has a low learning curve, readable yet succinct code, a thriving community of users, as well as useful libraries and modules that can be leveraged to create dashboards.

Significantly for data scientists who are not also web developers, Dash abstracts many elements of web development to Python, allowing you and your team to remain in the Pythonic state of mind if desired.

Take a look at this Dash example for a navigation bar — notice that the HTML/DOM elements all created from within Python.

This is the web app that the snippet was taken from.

Demo Dash web app (link)

Dash provides Python interfaces to web-based components, while being declarative and reactive. Together, it enables easy creation of flexible, informative front ends that are accessible for everyone to interact with, whether for data exploration or presentations.

As foreshadowed above, incorporating one of these Plotly Express charts into Dash is straightforward.

For example — the word embedding bubble chart can be implemented in Dash like this:

As implemented, the user can select a parameter (perplexity) as a dropdown item, which initiates the callback function and updates the graph reactively — changing the 2-dimensional representation of the vectors. Below is a comparison of the bubble charts, at two different perplexity values.

Dash app t-SNE graphs at different parameters

This two-company bigram comparison is also incorporated in the Dash application as shown below.

Dash app — N-gram comparison component

More importantly, we only needed around 30 lines of code to add each Plotly Express chart to the Dash app, including interactivity and formatting, all without ever leaving Python. We think that this will ultimately improve productivity and efficacy for data scientists such as yourself.

Obviously, this is just a quick skimming of what is possible in NLP visualizations, but we hope to have showed you the kind of simplicity and ease of use that we believe makes Dash and Plotly a powerful tool for NLP practitioners.

We invite you to explore the app and the code yourself, and create your own visualizations and dashboards and applications.

We are excited to see your own NLP visualizations built with Plotly Express and Dash. Feel free to share your graphics with us on Twitter at @plotlygraphs. To schedule a demo or learn more visit https://plotly.com/get-demo/.

--

--

JP Hwang
Plotly

Tech / Data science writer & educator; Python dev; sports analytics enthusiast. 🇦🇺 🐦: @_jphwang