Building Your First Wordcloud with Google Colaboratory and Python

elvis
DAIR.AI
Published in
4 min readAug 13, 2018

The goal of this tutorial is to teach you how to create a word cloud using Python and Google Colaboratory (Colab). You will learn how to leverage the free storage provided by Google Drive and the powerful, free programming environment of Google Colab. Have a peek at what we will build here.

This is just a very simple use-case demonstrating how you can take advantage of Colab notebooks and Google Drive for implementing your data science projects — all for free. After this tutorial, you should be able to work on solving real life data science problems without the need to manage a computing environment. The best part is that you will also get to share your projects with the world with just the click of a button. Let’s get started!

Requirements

  1. First, you will need a Google account to access both Google Colab and Google Drive. Colab notebooks are stored directly on your Google Drive.
  2. You will also need a text file for which you want to generate a word cloud. Make sure to upload your desired text file with extension .txt into your Google Drive, preferably in the same folder where you are storing your Colab notebooks.

Steps

Before you begin, here is a list of the tasks you will be working on:

  • Import the necessary python libraries to run your code successfully
  • Read your own files to the Colab environment from Google Drive
  • Perform a few statistics on the imported file
  • Generate a word cloud with the wordcloud library
  • Bonus: Generate a masked word cloud of your choice

Importing Main Libraries

Google Colaboratory automatically installs some of the common python packages for you, but there may be cases where you will need to install your own libraries such as the wordcloud library which we will use to generate word clouds. You can directly install packages within the Colab notebook with the !pip install package_name command as follows:

Now you can import the necessary libraries you will need to create your first word cloud:

Connecting Colab with Google Drive

To create the word cloud, you need a text file, which you can upload directly to Google Drive, and import it to the Colab environment. The function below does that for you:

When you run the command below, you will be provided a link that will authenticate your account and request access to your Google Drive.

Read a Text File into Colab

After authenticating your Google Drive, now you can import text files and operate on them directly through Colab.

The function below allows you to read the text file which you will use to create the word cloud:

Before you run the command that follows, you should have already imported your text file into Google Drive. Once you have the file stored on your Google Drive, all you need to do now is to copy the file id from the picture's shareable link. In the animation below, I demonstrate how to obtain the file id (the ending portion of the shareable link):

Then you just pass the file_id to the read_file function as shown below:

Read the File and Explore it

Once you have successfully read the text file to the Colab environment, you are now ready to start exploring the text file and generate your first word cloud.

First, we explore the text file to see what it contains:

This outputs:

148570

Project Gutenberg’s Alice’s Adventures in Wonderland, by Lewis Carroll This eBook is for the u

Generating Word Cloud

Then you can generate the word cloud as follows:

The resulting word cloud looks something like this:

Generate A Masked Wordcloud

With the wordcloud library you can also generate customized word clouds based on an image silhouette. These type of clouds are also called masked word clouds. To do this, you will first need to upload a silhouette image of your choice to Google Drive. Then you can get the file id from the shareable link similar to what we did for our text file (see animation above). You can get a silhouette image directly from Google Images.

The code below takes care of generating the masked word cloud:

The silhouette and the resulting masked word cloud looks like this:

That’s it for word clouds for now. If you went through the entire tutorial I congratulate you for that. Hopefully, you are now more open to using Google Colab and have realized how powerful this tool can be, and how it can help you with developing your own data science projects in the future.

The instructions and code are also available here. Until next time!

--

--