WordClouds with Python
A step-by-step guide to create and customize Word Clouds
Word cloud is one of the most powerful and straightforward visualization methods when it comes to text data. The size of words are dependent on the occurrence frequency (text’s importance in the context), therefore it comes in handy for Natural Language Processing machine learning projects or text analysis. I played around with them for my recent project exploring the Metaverse.
In this article, I will show you how to create word clouds in Python and get creative with them.
Now let’s get started!
Step 1: Install Packages
We will need the word cloud generator to create the visuals for us, and keep in mind that wordcloud depends on the essential libraries NumPy and pillow. We also need matplotlib to display or save the image locally.
In your terminal (Linux / maxOS) or the command prompt (windows) type in the code below depends on your environment.
If you’re using pip:
$ pip install wordcloud
If you are using conda:
conda install -c conda-forge wordcloud
Step 2: Body of Text
For this article, I acquired data through scraping FAANG companies related topics with Beautiful Soup in Wikipedia. If you want to know the detailed steps I took to get the text, please visit this notebook on my GitHub.
For data preparation, I cleaned the text using the following steps:
- Removing special characters
- Tokenization
- Lemmatization
If you want to see the code and function, please visit this prepare.py file.
For each word cloud (company) I’m creating, the corresponding text is stored in a variable as string. Here’s to give you an idea what the content looks like after cleaning.
- Meta (57k words)
- Apple (168k words)
- Amazon (112k words)
- Netflix (147k words)
- Google (113k words)
Step 3: Import Libraries
After having all the essential libraries (mentioned in Step 1) installed and having the body of text ready, we need to ensure to import them.
Step 4: Generate WordCloud
Now we have everything we need for generating word cloud. Create an object of class WordCloud with the name of your choice and call the generate( ) method.
Note that WordCloud( ) takes in several arguments and allows us to customize the shape, color, etc. which we will get into in examples below.
Here I’m naming my word cloud object “wc”. I didn’t pass in any arguments inside the WordCloud( ), leaving everything else as default. Let’s look at what does the simple Meta word cloud looks like by default.
As you can see, there are numbers along the axis, and we can turn them off by calling:
WordCloud( ) contains parameters including the following:
- background_color: Color of background
- max_words: The maximum number of unique words used
- stopwords: A stopword list to exclude the words you don’t wish to display
- colormap: The color theme
- width: The width of the WordCloud image
- height: The height of the WordCloud image
Step 5: Customize Color
One of the features I like the most is the “colormap” argument which allow us to customize the color palette to we wanted. Let’s try using these parameters with the Meta text corpus.
Result:
I set the background color to ‘white’, it also takes in any color with hex code format (‘#FFFFFF’). I added ‘Meta’ to the stopwords list, so that the WordCloud will display more relevant text related to Meta.
In this example I use the ‘binary’ for the colormap, here’s the complete list of values you can choose from:
'Accent', 'Accent_r', 'Blues', 'Blues_r', 'BrBG', 'BrBG_r', 'BuGn', 'BuGn_r', 'BuPu', 'BuPu_r', 'CMRmap', 'CMRmap_r', 'Dark2', 'Dark2_r', 'GnBu', 'GnBu_r', 'Greens', 'Greens_r', 'Greys', 'Greys_r', 'OrRd', 'OrRd_r', 'Oranges', 'Oranges_r', 'PRGn', 'PRGn_r', 'Paired', 'Paired_r', 'Pastel1', 'Pastel1_r', 'Pastel2', 'Pastel2_r', 'PiYG', 'PiYG_r', 'PuBu', 'PuBuGn', 'PuBuGn_r', 'PuBu_r', 'PuOr', 'PuOr_r', 'PuRd', 'PuRd_r', 'Purples', 'Purples_r', 'RdBu', 'RdBu_r', 'RdGy', 'RdGy_r', 'RdPu', 'RdPu_r', 'RdYlBu', 'RdYlBu_r', 'RdYlGn', 'RdYlGn_r', 'Reds', 'Reds_r', 'Set1', 'Set1_r', 'Set2', 'Set2_r', 'Set3', 'Set3_r', 'Spectral', 'Spectral_r', 'Wistia', 'Wistia_r', 'YlGn', 'YlGnBu', 'YlGnBu_r', 'YlGn_r', 'YlOrBr', 'YlOrBr_r', 'YlOrRd', 'YlOrRd_r', 'afmhot', 'afmhot_r', 'autumn', 'autumn_r', 'binary', 'binary_r', 'bone', 'bone_r', 'brg', 'brg_r', 'bwr', 'bwr_r', 'cividis', 'cividis_r', 'cool', 'cool_r', 'coolwarm', 'coolwarm_r', 'copper', 'copper_r', 'cubehelix', 'cubehelix_r', 'flag', 'flag_r', 'gist_earth', 'gist_earth_r', 'gist_gray', 'gist_gray_r', 'gist_heat', 'gist_heat_r', 'gist_ncar', 'gist_ncar_r', 'gist_rainbow', 'gist_rainbow_r', 'gist_stern', 'gist_stern_r', 'gist_yarg', 'gist_yarg_r', 'gnuplot', 'gnuplot2', 'gnuplot2_r', 'gnuplot_r', 'gray', 'gray_r', 'hot', 'hot_r', 'hsv', 'hsv_r', 'inferno', 'inferno_r', 'jet', 'jet_r', 'magma', 'magma_r', 'nipy_spectral', 'nipy_spectral_r', 'ocean', 'ocean_r', 'pink', 'pink_r', 'plasma', 'plasma_r', 'prism', 'prism_r', 'rainbow', 'rainbow_r', 'seismic', 'seismic_r', 'spring', 'spring_r', 'summer', 'summer_r', 'tab10', 'tab10_r', 'tab20', 'tab20_r', 'tab20b', 'tab20b_r', 'tab20c', 'tab20c_r', 'terrain', 'terrain_r', 'turbo', 'turbo_r', 'twilight', 'twilight_r', 'twilight_shifted', 'twilight_shifted_r', 'viridis', 'viridis_r', 'winter', 'winter_r'
I will be displaying a few of the colormap values in the following examples, and I encourage you to play around with these color palettes!
Step 5: Customize Shape
Here I’m importing the shape I want to use for the “mask”, which is the logo for company Meta:
With the ‘meta_mask’ variable, we have opened the image ‘meta_logo.png’. As you see from the output, NumPy convert the PIL images into array, indicating the value of each pixel of the image.
The ‘0’ in the array indicates ‘pure white’, and ‘255’ indicates ‘black’. Keep in mind that the image you import must be black and white. When generating word cloud, the text will fill in the black area and the rest will be considered as background.
Now let’s put this mask into our word cloud!
Result:
There’s a lot of room for creativity, and let’s look at what the word clouds look like for other company logos.
- Apple
Result:
- Amazon
Result:
- Netflix
Result:
Result:
Here we explored a couple colormaps, including ‘binary’, ‘BuPu_r’, ‘bone’, ‘copper’, ‘YlOrRd’, and ‘Paired’ as well as shapes of FAANG companies’ logos. There’re a ton of other things you can create with different text, shapes, and colors. I can’t wait to see what y’all’s word clouds look like!
I hope this article illustrated how to create word clouds clearly to you, and please share you questions/thoughts in the comment. I look forward to hearing your ideas!