My Solution to Python Challenge — Level 2: Data Structure

Keen Wilson
5 min readFeb 11, 2019

--

My husband and I are engaging in a coding riddle called the Python Challenge. I haven’t gone that far, but I found myself learning something about data structure when I was working on level 2.

Hint from the page: recognize the characters. maybe they are in the book, but MAYBE they are in the page source.

When you view page source, this is what you see.

Find rare characters in the mess.

So this is how I approach this problem:

Loading data

First, I manually copy-and-paste the text to a .txt file. Then I use Python’s built-in open function to return a file object and use a read() method for reading the content of the file. I save the content to a variable called text.

text = open(‘resources/level2.txt’).read()

Breaking down a problem

Now we have some text, the challenge wants us to write a program to find rare characters, in other words, the least repeated character in this text.

First, we need to know how many times each character repeated. Once we have that information, we can find the least repeated character.

What kind of data structure is useful for storing this information? A dictionary! Because a dictionary is a collection of key-value pairs, so here we can use the characters as the keys and the repetition as a value.

So I’m going to start by defining a dictionary. Let’s call it char or character frequency. We set it to an empty dictionary.

char_frequency = {}

Now we need to iterate over the string, get each character, and update its frequency in this dictionary. So we need a for loop.

for char in text:

Now first we need to see if we have this character in our dictionary. If we do have this character, then we simply get the value of its frequency and increment it by 1. Otherwise, we need to add it to our dictionary. So, we set char_frequency of char to 1.

    if char in char_frequency:        char_frequency[char] += 1    else:        char_frequency[char] = 1

Let’s print this dictionary and see what we get.

print(char_frequency)

Run the program using a code-runner extension in VScode by saving changes and pressing Ctrl + alt +n.

Print char_frequency to the output

So '%' is repeated 6104, '$'is repeated 6046, '@' is repeated 6157 times and so on.

Now, this output is a little bit unreadable. So I’m going to show you a trick to make it more readable. We have a module called pretty printing. On the top of your code, we import pprint function form pprint module

from pprint import pprint

With this function, we have more control over printing stuff on the terminal.

So now, instead of the regular print function, we use pprint and as the second argument, we pass a keyword argument called width that determines the number of characters on each line. If this output doesn’t fit, this function will add a line break. Let’s set it to 1 and see what happens when we run the program.

pprint(char_frequency, width=1)

Pretty printing the result

Finding an answer

So now we have all the information to solve this problem. The next step is to sort this dictionary by the frequency of characters. However, dictionaries, like sets, are unordered collections, we cannot sort them. We can only sort lists. So we need to pull out the items in this dictionary and put them in a list for sorting.

Basically, we need to takeout each key-value pair, convert it to a tuple and then put it in a list. We will end up with a list of tuples that we can easily sort. How can we do this?

I’m going to call a sorted built-in function that builds a new sorted list from an iterable. This function takes an iterable and sorts it. As an iterable, I’m going to pass char_frequency.items(). This .items() method returns a view object that displays a list of dictionary (key, value) tuple pairs. So at this point, let’s see what we get if we print this result.

print(sorted(char_frequency.items()))

The result from running a sorted function on char_frequency.items()

We get a list of tuples!! In each tuple, we have two items: the first is the character and the second is the repetition or frequency.

However, as you can see, the list is not sorted because, by default, this sorted function doesn’t know how to sort these tuples.

We can pass a second argument to a sorted function to customize the sort order, key. We set this to a lambda which is an anonymous function. This lambda function takes a key-value pair kv and returns the value. In this case, we access the value in tuples by using the square brackets [] for slicing along with the index or indices to obtain value available at that index. So the value this lambda function will return is kv[1]. This is the frequency of each character, we’re going to use that for sorting.

So, let’s run the program one more time.

Now, as you can see this list is sorted. e, q, u, a, l, i, t, y are repeated only once.

Getting an answer

Finally, the characters with a frequency equal to 1 in this list are the solution to our problem. Rare characters with only one occurrence are e, q, u, a, l, i, t, y. The answer is “equality”. I replace equality in the URL and it leads me to the next challenge!

http://www.pythonchallenge.com/pc/def/equality.html Hints: One small letter, surrounded by EXACTLY three big bodyguards on each of its sides.

And that brings us to the end of my solution to level 2 of #pythonchallenge. Thank you for reading and Happy Coding!

Photo by Steve Halama on Unsplash

--

--

Keen Wilson

Web Developer with coding experience in #JavaScript #MERN stack Located in Kansas City area. MBA with an ad agency background #DigitalMarketing #SEO