WordCloud Based On Image
Wordcloud is a visual representation of clusters of words with different sizes according to their times of occurrence in the dataset. The more a word appears larger will be its size in the word cloud. It helps in understanding the sentiments or the occurrence of particular words in a dataset.
Wordcloud is an open-source python library that is used to create wordclouds. In this article, we will be using different images for creating wordclouds on them and also explore different datasets and types of wordcloud.
Let’s get started…
Installing required libraries
We will start by installing the wordcloud using pip. The command given below will install it.
pip install wordcloud
Importing required libraries
In this step, we will import the required libraries and functions to create wordcloud.
import osfrom PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_gradient_magnitude
from wordcloud import WordCloud, ImageColorGenerator
Creating wordcloud image
Now we will create wordcloud. You can use any textual dataset and image to make the wordcloud. I am using the image of a parrot to create the wordcloud in the parrot shape.
# load wikipedia text
text = open(os.path.join(d, 'wiki_rainbow.txt'), encoding="utf-8").read()# load image. This has been modified in gimp to be brighter and have more saturation.
parrot_color = np.array(Image.open(os.path.join(d, "parrot1.jpg")))
# subsample by factor of 3. Very lossy but for a wordcloud we don't really care.
parrot_color = parrot_color[::3, ::3]# create mask white is "masked out"
parrot_mask = parrot_color.copy()
parrot_mask[parrot_mask.sum(axis=2) == 0] = 255# some finesse: we enforce boundaries between colors so they get less washed out.
# For that we do some edge detection in the image
edges = np.mean([gaussian_gradient_magnitude(parrot_color[:, :, i] / 255., 2) for i in range(3)], axis=0)
parrot_mask[edges > .08] = 255# create wordcloud. A bit sluggish, you can subsample more strongly for quicker rendering
# relative_scaling=0 means the frequencies in the data are reflected less
# acurately but it makes a better picture
wc = WordCloud(max_words=2000, mask=parrot_mask, max_font_size=40, random_state=42, relative_scaling=0)# generate word cloud
plt.imshow(wc)# create coloring from image
image_colors = ImageColorGenerator(parrot_color)
Here you can clearly observe how we created a wordcloud in the shape of a parrot. We can also use CLI for creating wordclouds, an example is given below.
wordcloud_cli --text alice.txt --imagefile alice_mask.png
This is how we can create a wordcloud image using the wordcloud library. Go ahead try this and let me know your comments in the response section.
This article is in collaboration with Piyush Ingale.
Before You Go
Thanks for reading! If you want to get in touch with me, feel free to reach me on email@example.com or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.