Creating a Tag Cloud Using Python Graphics

Text data visualization made easy

Aryaman Kukal
The Academically Driven
6 min readJul 2, 2020

--

Yet again, a Python Turtle graphics project has been circling in my mind for some time now.

The concept is called a tag cloud, also known as a word cloud. It’s a novelty visual representation of text data, typically used to depict keyword metadata on websites or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.

Let’s look at an example with a famous excerpt from Nicholson Baker’s The Everlasting Story of Nory.

“Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been.”

There are many different words in the sentence, but the words “Catholic”, “was”, and “a” are repeated a surprising six times.

Yes, words such as “her” and “mother” are also repeated a few times, but those words repeat the most. These words should appear the largest on the screen.

A tag cloud is designed to illustrate data containing only keywords. Hence, any sort of punctuation was removed from the sentence, except the punctuation within words such as an apostrophe.

The version of the program I created is essentially this: Once the count of each unique word is calculated, the words are randomly scattered on a 1500 by 1000 pixel screen in random colors and different sizes, corresponding to how frequent the certain word is in the phrase. That’s it!

Let’s begin with the code.

1 — I imported the Turtle library to carry out all Turtle processes.

2 — I imported the random library to take care of certain tasks with the random.randint() function, to select random integers later on in the program.

3 — the function .Screen() returns a singleton object of a TurtleScreen subclass and creates the screen the graphics will be displayed on.

4 — This is an optional line that simply gives the Turtle Screen a name, which shows up in the top bar.

5 — This is also completely optional, as it sets the color of the Screen, but black is the most visually appealing.

6 — This sets the width and height of the Screen. 1500 × 1000 are the best dimensions.

7 — This is a list of different colors, which we will deal with later.

8 — This creates the first turtle object, which we will use as a virtual pen.

9 — This sets the speed of the pen object, specifically the animation speed, to 0, which is ironically the fastest setting.

10 — This is the sentence that will be visualized in the form of a tag cloud.

11 — The .split() function is an incredibly useless string function that splits a string into a list of words, automatically excluding any spaces.

12 — This creates a dictionary to store all the unique words in the string as keys and their counts as the values.

13 — This creates a simple for loop that iterates through the list of words that we called stringList.

14 — This finds the count, or how many times the word occurs, of the current word in the string.

15 — This adds a key and value pair to dictionaryOfWordsAndTheirCounts where the key is the current word and the value is its count.

Note: If you are concerned that the for loop will integrate over the same words multiple times, that is not a problem. Since every key in a dictionary has to be unique, the for loop will automatically ignore words that it has already found the counts of.

16 — This creates a list of lists, where each internal list has 2 items, the key first, and the value second. Each item is this key and value pair for each unique word.

17 — This creates a for loop that iterates through every item in the list of lists.

18 — This stores the word name of the current word.

19 — This stores the word count of the current word.

20 — This picks the pen up so the turtle does not draw a line as it moves, just like you would lift your pen of a real paper so you don’t draw on it when you don’t want to.

21 — This makes the turtle invisible on the screen so you don’t see that black triangle drawing things.

22 — The .goto() function sends the pen object to a certain coordinate on the Screen. However, instead of a fixed x and y value, we set the x and y value to a random number from the numbers shown in parentheses using the random.randint() function. This sends the turtle to a random position on the Screen to draw a word.

23 — Since we want each word to be drawn with random colors, this line uses the random.randint() function to pick a number between 1 and 15, with each integer representing the index of each color in the listOfRandomColors list.

24 — This sets the color of the pen for the current iteration to the index of the listOfRandomColors that we picked with the previous line.

25 — This brings everything together and draws the current word on the Screen. The first argument of the .arg() function tells the function what word to write. The second argument simply aligns the word in a certain way; in this case, we decided to center it. The last argument deals with the font. The font type can be changed from Comic Sans MS. The font size was also set as the word count times 30. The factor at which the word is enlarged can be changed, but 30 is the best.

26 — This function creates an infinite loop used to run the application, waits for an event to occur, and processes the event as long as the window is not closed.

27 — This waits for the user to click the mouse somewhere in the window. In our case, it’ll be the X button in the top right. When this click event occurs, the response is to close the turtle window and exit.

That’s it!

Write your mind, or read other’s. Visit The Academically Driven!

Email: theacademicallydriven@gmail.com

Instagram: @theacademicallydriven

Facebook: @theacademicallydriven

LinkedIn: The Academically Driven

Twitter: @AcademicallyThe

--

--

Aryaman Kukal
The Academically Driven

tech enthusiast ~ programmer ~ avid writer ~ always curious; rarely express it ~ freshman @ American High ~ “Make it work, make it right, make it fast.”