WordClouds with Python

Meredith Wang
6 min readSep 16, 2022

--

A step-by-step guide to create and customize Word Clouds

Image by Author

Word cloud is one of the most powerful and straightforward visualization methods when it comes to text data. The size of words are dependent on the occurrence frequency (text’s importance in the context), therefore it comes in handy for Natural Language Processing machine learning projects or text analysis. I played around with them for my recent project exploring the Metaverse.

In this article, I will show you how to create word clouds in Python and get creative with them.

Now let’s get started!

Step 1: Install Packages

We will need the word cloud generator to create the visuals for us, and keep in mind that wordcloud depends on the essential libraries NumPy and pillow. We also need matplotlib to display or save the image locally.

In your terminal (Linux / maxOS) or the command prompt (windows) type in the code below depends on your environment.

If you’re using pip:

$ pip install wordcloud

If you are using conda:

conda install -c conda-forge wordcloud

Step 2: Body of Text

For this article, I acquired data through scraping FAANG companies related topics with Beautiful Soup in Wikipedia. If you want to know the detailed steps I took to get the text, please visit this notebook on my GitHub.

For data preparation, I cleaned the text using the following steps:

If you want to see the code and function, please visit this prepare.py file.

For each word cloud (company) I’m creating, the corresponding text is stored in a variable as string. Here’s to give you an idea what the content looks like after cleaning.

  • Meta (57k words)
Wikipedia Meta Content
meta_text
  • Apple (168k words)
Wikipedia Apple Content
apple_text
  • Amazon (112k words)
amazon_text
  • Netflix (147k words)
netflix_text
  • Google (113k words)
google_text

Step 3: Import Libraries

After having all the essential libraries (mentioned in Step 1) installed and having the body of text ready, we need to ensure to import them.

Essential Imports

Step 4: Generate WordCloud

Now we have everything we need for generating word cloud. Create an object of class WordCloud with the name of your choice and call the generate( ) method.

Note that WordCloud( ) takes in several arguments and allows us to customize the shape, color, etc. which we will get into in examples below.

Generate Default WordCloud

Here I’m naming my word cloud object “wc”. I didn’t pass in any arguments inside the WordCloud( ), leaving everything else as default. Let’s look at what does the simple Meta word cloud looks like by default.

Simple Default WordCloud Setting — Image by Author

As you can see, there are numbers along the axis, and we can turn them off by calling:

Turn Off Axis in matplotlib

WordCloud( ) contains parameters including the following:

  • background_color: Color of background
  • max_words: The maximum number of unique words used
  • stopwords: A stopword list to exclude the words you don’t wish to display
  • colormap: The color theme
  • width: The width of the WordCloud image
  • height: The height of the WordCloud image

Step 5: Customize Color

One of the features I like the most is the “colormap” argument which allow us to customize the color palette to we wanted. Let’s try using these parameters with the Meta text corpus.

Generate WordCloud with Colormap ‘binary’

Result:

Meta WordCloud with Colormap ‘binary’ — Image by Author

I set the background color to ‘white’, it also takes in any color with hex code format (‘#FFFFFF’). I added ‘Meta’ to the stopwords list, so that the WordCloud will display more relevant text related to Meta.

In this example I use the ‘binary’ for the colormap, here’s the complete list of values you can choose from:

'Accent', 'Accent_r', 'Blues', 'Blues_r', 'BrBG', 'BrBG_r', 'BuGn', 'BuGn_r', 'BuPu', 'BuPu_r', 'CMRmap', 'CMRmap_r', 'Dark2', 'Dark2_r', 'GnBu', 'GnBu_r', 'Greens', 'Greens_r', 'Greys', 'Greys_r', 'OrRd', 'OrRd_r', 'Oranges', 'Oranges_r', 'PRGn', 'PRGn_r', 'Paired', 'Paired_r', 'Pastel1', 'Pastel1_r', 'Pastel2', 'Pastel2_r', 'PiYG', 'PiYG_r', 'PuBu', 'PuBuGn', 'PuBuGn_r', 'PuBu_r', 'PuOr', 'PuOr_r', 'PuRd', 'PuRd_r', 'Purples', 'Purples_r', 'RdBu', 'RdBu_r', 'RdGy', 'RdGy_r', 'RdPu', 'RdPu_r', 'RdYlBu', 'RdYlBu_r', 'RdYlGn', 'RdYlGn_r', 'Reds', 'Reds_r', 'Set1', 'Set1_r', 'Set2', 'Set2_r', 'Set3', 'Set3_r', 'Spectral', 'Spectral_r', 'Wistia', 'Wistia_r', 'YlGn', 'YlGnBu', 'YlGnBu_r', 'YlGn_r', 'YlOrBr', 'YlOrBr_r', 'YlOrRd', 'YlOrRd_r', 'afmhot', 'afmhot_r', 'autumn', 'autumn_r', 'binary', 'binary_r', 'bone', 'bone_r', 'brg', 'brg_r', 'bwr', 'bwr_r', 'cividis', 'cividis_r', 'cool', 'cool_r', 'coolwarm', 'coolwarm_r', 'copper', 'copper_r', 'cubehelix', 'cubehelix_r', 'flag', 'flag_r', 'gist_earth', 'gist_earth_r', 'gist_gray', 'gist_gray_r', 'gist_heat', 'gist_heat_r', 'gist_ncar', 'gist_ncar_r', 'gist_rainbow', 'gist_rainbow_r', 'gist_stern', 'gist_stern_r', 'gist_yarg', 'gist_yarg_r', 'gnuplot', 'gnuplot2', 'gnuplot2_r', 'gnuplot_r', 'gray', 'gray_r', 'hot', 'hot_r', 'hsv', 'hsv_r', 'inferno', 'inferno_r', 'jet', 'jet_r', 'magma', 'magma_r', 'nipy_spectral', 'nipy_spectral_r', 'ocean', 'ocean_r', 'pink', 'pink_r', 'plasma', 'plasma_r', 'prism', 'prism_r', 'rainbow', 'rainbow_r', 'seismic', 'seismic_r', 'spring', 'spring_r', 'summer', 'summer_r', 'tab10', 'tab10_r', 'tab20', 'tab20_r', 'tab20b', 'tab20b_r', 'tab20c', 'tab20c_r', 'terrain', 'terrain_r', 'turbo', 'turbo_r', 'twilight', 'twilight_r', 'twilight_shifted', 'twilight_shifted_r', 'viridis', 'viridis_r', 'winter', 'winter_r'

I will be displaying a few of the colormap values in the following examples, and I encourage you to play around with these color palettes!

Step 5: Customize Shape

Here I’m importing the shape I want to use for the “mask”, which is the logo for company Meta:

Open Image for Mask

With the ‘meta_mask’ variable, we have opened the image ‘meta_logo.png’. As you see from the output, NumPy convert the PIL images into array, indicating the value of each pixel of the image.

‘meta_logo.png’

The ‘0’ in the array indicates ‘pure white’, and ‘255’ indicates ‘black’. Keep in mind that the image you import must be black and white. When generating word cloud, the text will fill in the black area and the rest will be considered as background.

Now let’s put this mask into our word cloud!

Generate WordCloud with Meta Logo as Mask

Result:

Meta WordCloud with Colormap ‘BuPu_r’
Meta WordCloud with Colormap ‘BuPu_r’ — Image by Author

There’s a lot of room for creativity, and let’s look at what the word clouds look like for other company logos.

  • Apple
Generate WordCloud with Apple Logo as Mask

Result:

Apple WordCloud with Colormap ‘bone’. Image by Author
Apple WordCloud with Colormap ‘bone’. Image by Author
  • Amazon
Generate WordCloud with Amazon Logo as Mask

Result:

Amazon WordCloud with Colormap ‘copper’ — Image by Author
  • Netflix
Generate WordCloud with Netflix Logo as Mask

Result:

Netflix WordCloud with Colormap ‘YlOrRd’ — Image by Author
  • Google
Generate WordCloud with Google Logo as Mask

Result:

Google WordCloud with Colormap ‘Paired’ — Image by Author

Here we explored a couple colormaps, including ‘binary’, ‘BuPu_r’, ‘bone’, ‘copper’, ‘YlOrRd’, and ‘Paired’ as well as shapes of FAANG companies’ logos. There’re a ton of other things you can create with different text, shapes, and colors. I can’t wait to see what y’all’s word clouds look like!

I hope this article illustrated how to create word clouds clearly to you, and please share you questions/thoughts in the comment. I look forward to hearing your ideas!

--

--

Meredith Wang

Data Scientist | Entrepreneur | Photographer | Visual Storyteller