Generating Wordcloud using Python

Alka Mehta
3 min readMar 16, 2022

--

Wordcloud on Squid Game Netflix Twitter Data

Day1 of #100DaysofDataProjects

Here is a small project that creates a word cloud based on the popular Netflix show Squid Game.

What is a Wordcloud?
A word cloud or tag cloud is a visual representation of text data, which is often used to depict key metadata or visualize free-form text. Tags are usually single words, and their size can be used to represent their significance i.e. the words that frequently occur are bigger in size.

Reference: https://en.wikipedia.org/wiki/Tag_cloud

This Squid Game dataset can be downloaded from here on Kaggle.

A snapshot of the dataset:

Snapshot of the dataset

Generating a simple word cloud:

I started by looking at the columns and identifying which ones I will be needing. For word cloud, I only needed the tweet text so I focused on that.

Further, I cleaned up my data by removing common and customer stopwords using stopwords and WordNetLemmatizer from the nltk library, and re library in Python.

NLTK: Natural Language Processing Toolkit is a suite of libraries/programs for symbolic and statistical natural language processing for the English language.

Stopwords: words that do not add much meaning and can be ignored without actually compromising the meaning of what is being said. NLTK has a list of stopwords that can be used.

WordNetLemmatizer: It groups together all the different forms of a word so they can be analyzed as one.

re: helps in regular expression matching operations

Source: https://en.wikipedia.org/wiki/Natural_Language_Toolkit

Then I created my word cloud using my extracted clean text, creating and generating a word cloud image displayed using the matplotlib library.

Squid Game Tweets Wordcloud

Masking word cloud with images:

Since I was working with tweets data, I thought of masking my word cloud in the shape of the Twitter logo. In order to create a shape for the word cloud, I found a PNG file to act as the mask. And below is the result.

Squid Game Tweets Wordcloud in Twitter Mask

Finally, I wanted to see my word cloud in the color of the mask used. Hence, I used recolor functionality that changed default colors to the Twitter icon’s blue color. This is the final result.

Squid Game Tweets Wordcloud in Twitter Mask and its color

The images created can easily be saved using to_file function in Wordcloud.

Here are my findings:

It can be seen from this analysis, that a lot of tweets mentioned reviews like good, love, and like which conveys that a lot of reviews have a positive sentiment.

Words like player, green light, red light, play, life have popped up which are consistent with the theme of the show.

Episode one is huge in size, hence seems to be the favorite although further analysis needs to be done to determine it.

I further plan on doing exploratory and sentiment analysis on the same dataset.

#dataanalysis #python #datascience #wordcloud #twitter #squidgameanalysis #netflixdata

--

--

Alka Mehta

Starting my #100daysofproject on data science, data exploration, and visualization