My Solution to Python Challenge — Level 2: Data Structure
My husband and I are engaging in a coding riddle called the Python Challenge. I haven’t gone that far, but I found myself learning something about data structure when I was working on level 2.
When you view page source, this is what you see.
So this is how I approach this problem:
Loading data
First, I manually copy-and-paste the text to a .txt file. Then I use Python’s built-in open
function to return a file object and use a read()
method for reading the content of the file. I save the content to a variable called text
.
text = open(‘resources/level2.txt’).read()
Breaking down a problem
Now we have some text, the challenge wants us to write a program to find rare characters, in other words, the least repeated character in this text.
First, we need to know how many times each character repeated. Once we have that information, we can find the least repeated character.
What kind of data structure is useful for storing this information? A dictionary! Because a dictionary is a collection of key-value pairs, so here we can use the characters as the keys and the repetition as a value.
So I’m going to start by defining a dictionary. Let’s call it char or character frequency. We set it to an empty dictionary.
char_frequency = {}
Now we need to iterate over the string, get each character, and update its frequency in this dictionary. So we need a for
loop.
for char in text:
Now first we need to see if we have this character in our dictionary. If we do have this character, then we simply get the value of its frequency and increment it by 1
. Otherwise, we need to add it to our dictionary. So, we set char_frequency of char to 1
.
if char in char_frequency: char_frequency[char] += 1 else: char_frequency[char] = 1
Let’s print this dictionary and see what we get.
print(char_frequency)
Run the program using a code-runner extension in VScode by saving changes and pressing Ctrl
+ alt
+n
.
So '%'
is repeated 6104, '$'
is repeated 6046, '@'
is repeated 6157 times and so on.
Now, this output is a little bit unreadable. So I’m going to show you a trick to make it more readable. We have a module called pretty printing. On the top of your code, we import pprint
function form pprint
module
from pprint import pprint
With this function, we have more control over printing stuff on the terminal.
So now, instead of the regular print function, we use pprint
and as the second argument, we pass a keyword argument called width
that determines the number of characters on each line. If this output doesn’t fit, this function will add a line break. Let’s set it to 1 and see what happens when we run the program.
pprint(char_frequency, width=1)
Finding an answer
So now we have all the information to solve this problem. The next step is to sort this dictionary by the frequency of characters. However, dictionaries, like sets, are unordered collections, we cannot sort them. We can only sort lists. So we need to pull out the items in this dictionary and put them in a list for sorting.
Basically, we need to takeout each key-value pair, convert it to a tuple and then put it in a list. We will end up with a list of tuples that we can easily sort. How can we do this?
I’m going to call a sorted
built-in function that builds a new sorted list from an iterable. This function takes an iterable and sorts it. As an iterable, I’m going to pass char_frequency.items()
. This .items()
method returns a view object that displays a list of dictionary (key, value) tuple pairs. So at this point, let’s see what we get if we print this result.
print(sorted(char_frequency.items()))
char_frequency.items()
We get a list of tuples!! In each tuple, we have two items: the first is the character and the second is the repetition or frequency.
However, as you can see, the list is not sorted because, by default, this sorted
function doesn’t know how to sort these tuples.
We can pass a second argument to a sorted
function to customize the sort order, key
. We set this to a lambda
which is an anonymous function. This lambda
function takes a key-value pair kv
and returns the value. In this case, we access the value in tuples by using the square brackets []
for slicing along with the index or indices to obtain value available at that index. So the value this lambda
function will return is kv[1]
. This is the frequency of each character, we’re going to use that for sorting.
So, let’s run the program one more time.
Getting an answer
Finally, the characters with a frequency equal to 1 in this list are the solution to our problem. Rare characters with only one occurrence are e, q, u, a, l, i, t, y. The answer is “equality”. I replace equality
in the URL and it leads me to the next challenge!
And that brings us to the end of my solution to level 2 of #pythonchallenge. Thank you for reading and Happy Coding!