Last year I spent many hours refreshing my algorithms — it’s been a while since I studied the fundamentals and you absolutely need them fresh for job interviews in Silicon Valley, and also for the fun of competitive programming.
There are many websites in which you can learn, practice and compete. Possibly the one that everyone uses, exclusively or not is leetcode, which runs a weekly competition that is full of rock stars that deliver brilliant solutions. And you get to see the solutions from the best of best!
For the easy questions the solution is usually easy to understand. For the hard ones, it might seem like black magic, and following the code requires spending time with a debugger, running the code many times, with different samples, until the logic “sinks in”. And this can become tedious.
So I had the idea of writing a program that could execute Python code line by line, document the change in variable values, maybe even the relationship between variables (such as i is a pointer in arr), and create a video in mp4 and/or gif that can be saved, uploaded, or embedded anywhere.
That idea has been on my ever growing backlog of pet projects for months.
And now there’s several alternative implementations. How?
One of the things I always find time to do every year, however busy I might be, is participate in the fantastic Google Code-In — it’s a program that Google sponsors that helps high-schoolers from around the world get started with programming in general, and more specifically, open source.
Many of the students are indeed starting out, but there’s also a bunch of kids that are already far, at least relatively speaking, in the learning curve. And they have the motivation and stamina to prove it!
We (that means my organization’s team, but that’s for a different post) came up with a number of ideas for the students to work on. One of them is this “algorithm video creator”. While the final goal was clear, we had to split it in several “bite sized” tasks that individually wouldn’t take more than one evening or so. Also a requirement is that they are reasonably approachable by the students, even if as you would expect each task was a bit harder than the one before it in the series.
In total, this series had 12 different tasks. 22 students completed the first task in the series, and only one made it to the 12th. However, the program was considered “usable and actually useful” from task 7, which was completed by six students.
Five of the projects have been integrated into our organization’s GitHub. Of course the students retain whatever rights exist and they are the administrators of each of their respective projects. Because the programs are actually very useful, at least for the use case that we had in mind, we expect them to actually grow as open source projects. We will see which implementation becomes more successful, or maybe they will merge at some point.
Here are the projects, in no special order, of course with the official sample video for each of them. All of them are hosted in GitHub.
The task descriptions
As mentioned, all these projects were completed during the course of Google Code-In, which ran from Dec 2, 2019 to Jan 23, 2020.
For those interested, here’s the descriptions for each of the tasks:
One of the (many) cool things about Python is that it can debug itself! Yes, you can write a Python program that debugs a Python program.
You can read about this here:
Your job is to write a python program that runs a python function and for each line that is executed writes to the console what variables have changed and from what values to what values. For example
Line 13: ‘x’ changed from 2 to 3
You don’t need to add breakpoints or anything fancy. For now, it’s just about understanding the program flow.
We have had a few good submissions to the previous Python debugger task.
They work well, but they have a problem: When just part of a variable is changed, they report the whole variable as having changed. Let’s extend this a bit and make it smarter by adding this constraint:
For arrays, lists, etc, when there’s a change, you need to report which element(s) have changed.
Extend your debugger so that after completing it lists:
- All the variables, their type, and where they were instantiated (function name, line number)
- For each variable, the range of values (if it’s a number), list of values (if it’s for example a string, or something in which a range doesn’t make a lot of sense), and where (in code) it took each value.
While the program is running: Add the following functionality to your debugger:
- Print the number of times each line has been executed so far
- Print the time spent in each line, both average and aggregated so far
After the program ends, write a report that includes
a) All created variables and where, as well as the ranges and type, and the initial and final value.
b) Total program execution time.
c) The number of times each line has been executed
Extend your debugger to add this functionality:
- If a program name is passed as a argument (path to a .py file) load that file and, by default, debug its main function. If that’s not possible, default to a different one.
- Allow passing a different starting function name.
- Write the output to a .text file. A suggestion here is that the .text file contains a json string per line, but you could use a different format. That json file needs to contain the same info you are currently printing to console, plus a “step” number (which is just incrementing starting at 1) and a timestamp for each step.
- If instead of a program name the argument passed is a .json file, then read that json file (which was of course generated by your own program) and write its output to screen, in human readable form.
As you can see, what we are doing here is separate the work in two phases: Analysis and reporting. The reporting should just need the json file, but not the source code of the program being debugged.
Let’s create a test suite for your program. This test suite will execute a number of well known algorithms. From the output of the program and without access to the source code, a reader (human) should be able to tell what the algorithm is doing.
For example, if the algorithm is a quick sort, then by following the output of your program and how the variable change as it runs, it should be obvious to tell that it’s sorting the input.
You don’t need to implement the algorithms yourself — it’s fine to use existing code since what you are writing is the debugger.
Create a test suite that generates the output for these well know problems:
Sorting: https://www.tutorialspoint.com/python_data_structure/python_sorting_algorithms.htm (all algorithms there)
Depth first search
Breadth first search
On the last task we were able to generate a .json file that describes the program flow — the steps, and what happens at each step.
We’re going to use this .json file (and the source code) to generate a video file that shows the program as it runs.
This is your job:
- Add a option -video to your debugger that takes the source code of the program being debugger, the .json file from the program execution and generates a video file that shows the program flow.
The video must contain two parts: To the left, display the source code, with the line just executed in a different color. Also display the number of times that line has been executed and the time spent on it so far.
To the right, the variables: Display in the different color the last modified variable as well as the current and previous values.
For the video generation itself you can use any library. You may need to use two libraries actually, one to compose each frame (this would be a graphics library that gives you a canvas and lets you add text to it) and another for the video, for example a .mp4 library. But it’s really up to you to come up with a good solution.
As always, upload to github and send us a link to the repo.
We left chapter 7 after creating a video. We’re going to improve it by make videos a bit more customizable by the end user. We thought about different ways to do it and we are leaning towards some kind of style sheet (not as in CSS, but similar concept). We suggest yaml but you can come up with something different or better if you think of something. When creating the video, IF that file exists (otherwise just do default) apply it to the output. That file should the user specify:
- Running speed
- Font to use
- Font size
- Intro text, which if present would be displayed for a few (configurable) seconds before starting to run the program. For example, maybe the user wants to write “Quicksort implementation for CS exam, by Joe Developer”.
- By default, write in small letters in each frame (when the program is running), maybe on the bottom right, “Generated with $program_name”. But allow the user to disable this since probably it will become annoying after a while. For now, we want to have this so there’s a few videos around and make the program a bit popular.
Add the following features to your program:
- Allow passing of arguments to the program to be debugged, for example if I have a program called “fibonacci.py” that prints to screen the fibonacci series up to a passed number, then I should be able to pass that number via your debugger.
- Work well with source code that doesn’t fit on the screen, which implies that may need to add scrolling (smooth, if you can), which is not supereasy to do if you want to do it well.
- In the video, have an area for the actual output of the program — if the program uses print(), its output should also appear on the video. After all, in a regular debugger, you expect to see the output of the program being debugged.
- Allow exporting to animated gif — this will make the video easy to embbed.
- Allow selecting output resolution.
Add (in the style sheet file, or whatever you are calling it) a way to specify different ways to display specific variables. For example, given this simple example:
my_list = [1, 3, 5, 7, 9] for i in range (len (my_list)): print my_list[i]
Clearly i is a pointer into my_list, yet there is no way to tell that from the source code. Come up with a good way to describe that relationship in the style sheet (also: come up with a better name for it) and a way to graphically display it in the video.
A possible idea is to allow pointer variables to be graphically below the element in the array they are pointing to.
Also, add an option to mute specific variables (don’t display them at all).
Following tasks will ask you to add more ways to represent variables in ways that are useful for well known data structures. Keep that in mind when coming up with your architecture!
Write a README.md with instructions. Make sure it contains (of course) sample videos. Make sure it mentions GCI 2019 and CCExtractor. Even if this is not a coding task and you’ll probably find it boring, take it seriously — good documentation and an attractive introduction to it can make or break the popularity of software!
We’re going to add support to a few super typical data structures — as well as we can (this is not really easy). Let’s see how far we can get.
As you can see, they are quite simple — basically they have a left node, a right node, and a value, for each for the nodes. Typically you always keep a reference to the root node so you can traverse the tree.
Your job is to allow your system to draw the tree either from the root node or from the node a variable currently contains.
Sadly, GCI is over — so now it’s time for these amazing small programs to take off on their own.
We are really proud of the students that chose to participate with us and take on this series of tasks. Two of them will win a trip to California to visit Google. They really deserve it for the effort and the passion they demonstrated during these six weeks.
We can’t wait to see what they do next.