Scattertext Visualizations

An elegant, interactive tool to elevate your text analysis

Kristen Davis
Feb 28 · 4 min read

Data visualizations at their best tell a story with the data that is both compelling to look at and easy to digest. The Scattertext tool created by Jason Kessler (check out the original documentation here) does just that. This guide will show you how to implement Scattertext with your data and bring the visual WOW to your work.

Getting Started:

Pss! If you don’t need help formatting your data skip down to “Turning your text to a Scattertext Corpus”!

Scatter text allows you to visualize unique terms in your corpora and how their frequency differs from one category to another. In the example above I am comparing terms used in an educational standards document developed by the NGSS vs the educational standards document developed by the state of Alaska but any text with two categories you wish to compare will do.

For this project, I was working with PDFs that, in their raw form, were not suitable for modeling. In order to quickly format the text data, I created a custom function that took in a file, lowered the words, tokenized the words, removed stop words and punctuation, and returning a list of cleaned words.

Custom function for cleaning and formatting text data

Using the custom-built function, now read in the PDF and build a data frame. I’ve added a few additional steps to clean my data such as setting the text column to a string in order to strip the unwanted brackets that appeared when I created the data frame from the list.

Read Data & Build Data Frame

Turning Your Text into a Scattertext Corpus

You’ll need to have the spacy and scatter text libraries. A guide to installing the spacy library can be found here and the scatterplot library here. Once you have them installed the first thing you will need to do is build a scatter text corpus.

Create Scatterplot Corpus

Breaking down each variable inside the parentheses starting on line 6 you have, the name of the data frame you are using, then the name of the column with the categorical information, and finally the name of the column with your text data.

Once you have your corpus built you can find words that are highly unique to your corpus quickly using a get_scaled_f_scores_vs_background method.

Print the Top 10 Corpus Unique Words

You can also use term frequency to identify words most associated with a given category in your corpus.

Print the Top 20 Associate Words with a Category

Heads Up: You need to run this word frequency count for BOTH your categories before attempting to graph the ScatterText Visualization!

Finally, the moment you’ve all been waiting for! If you have not run HTML in your Jupyter Notebook before make sure you have the Ipython.core.display import display & HTML as shown below.

Build an Interactive Scattertext Visualization

Breaking down each variable inside the parentheses starting on line 3 you have your spacy corpus, the category you wish to graph (note it is the name of the column value not the column name), how you want the category to appear on your visual, how you want the name of the opposite category to appear in your visual and finally your pixel count.

This may take a minute or two to run and will return a set of numbers within the Jupyter Notebook when complete. It will also save the visual to the specified final. In order to access the visual go to the folder, you created the visualization in and single-click the link.

Depending on the size of your visualization it may take anywhere from several seconds to many minutes for it to load. When it does, boom! There you go! A beautiful interactive text visualization tool!

Bonus: Host Your Visualization

One final note: what’s the good of an interactive tool that lives only inside your notebook? Use a site like tiiny.host to deploy your scattertext visualization and host it live with a clickable sharable link!

Happy Graphing!

MLearning.ai

Data Scientists must think like an artist when finding a solution

By MLearning.ai

A weekly collection of the best news and resources on AI & ART Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

MLearning.ai

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.

Kristen Davis

Written by

MLearning.ai

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store