Generating videos of algorithms implementations with Python

The problem

Last year I spent many hours refreshing my algorithms — it’s been a while since I studied the fundamentals and you absolutely need them fresh for job interviews in Silicon Valley, and also for the fun of competitive programming.

There are many websites in which you can learn, practice and compete. Possibly the one that everyone uses, exclusively or not is leetcode, which runs a weekly competition that is full of rock stars that deliver brilliant solutions. And you get to see the solutions from the best of best!

For the easy questions the solution is usually easy to understand. For the hard ones, it might seem like black magic, and following the code requires spending time with a debugger, running the code many times, with different samples, until the logic “sinks in”. And this can become tedious.

So I had the idea of writing a program that could execute Python code line by line, document the change in variable values, maybe even the relationship between variables (such as i is a pointer in arr), and create a video in mp4 and/or gif that can be saved, uploaded, or embedded anywhere.

That idea has been on my ever growing backlog of pet projects for months.

And now there’s several alternative implementations. How?

The implementations

One of the things I always find time to do every year, however busy I might be, is participate in the fantastic Google Code-In — it’s a program that Google sponsors that helps high-schoolers from around the world get started with programming in general, and more specifically, open source.

Many of the students are indeed starting out, but there’s also a bunch of kids that are already far, at least relatively speaking, in the learning curve. And they have the motivation and stamina to prove it!

We (that means my organization’s team, but that’s for a different post) came up with a number of ideas for the students to work on. One of them is this “algorithm video creator”. While the final goal was clear, we had to split it in several “bite sized” tasks that individually wouldn’t take more than one evening or so. Also a requirement is that they are reasonably approachable by the students, even if as you would expect each task was a bit harder than the one before it in the series.

In total, this series had 12 different tasks. 22 students completed the first task in the series, and only one made it to the 12th. However, the program was considered “usable and actually useful” from task 7, which was completed by six students.

Five of the projects have been integrated into our organization’s GitHub. Of course the students retain whatever rights exist and they are the administrators of each of their respective projects. Because the programs are actually very useful, at least for the use case that we had in mind, we expect them to actually grow as open source projects. We will see which implementation becomes more successful, or maybe they will merge at some point.

Here are the projects, in no special order, of course with the official sample video for each of them. All of them are hosted in GitHub.

vardbg by kdrag0n

Image for post
Image for post
vardbg

TDebugger, by Techno-Disaster

Image for post
Image for post
TDebugger

PyBud, by Tantan4321

Image for post
Image for post
PyBud

Cimico, by knightron0

Image for post
Image for post
Cimico

TinyDebug by mfarberbrodsky

Image for post
Image for post
TinyDebug

The task descriptions

As mentioned, all these projects were completed during the course of Google Code-In, which ran from Dec 2, 2019 to Jan 23, 2020.

For those interested, here’s the descriptions for each of the tasks:

Task 1

One of the (many) cool things about Python is that it can debug itself! Yes, you can write a Python program that debugs a Python program.

You can read about this here:

https://www.freecodecamp.org/news/hacking-together-a-simple-graphical-python-debugger-efe7e6b1f9a8/

https://docs.python.org/3/library/inspect.html

https://python-hunter.readthedocs.io/en/latest/cookbook.html

Your job is to write a python program that runs a python function and for each line that is executed writes to the console what variables have changed and from what values to what values. For example

Line 13: ‘x’ changed from 2 to 3

You don’t need to add breakpoints or anything fancy. For now, it’s just about understanding the program flow.

Task 2

We have had a few good submissions to the previous Python debugger task.

They work well, but they have a problem: When just part of a variable is changed, they report the whole variable as having changed. Let’s extend this a bit and make it smarter by adding this constraint:

For arrays, lists, etc, when there’s a change, you need to report which element(s) have changed.

Task 3

Extend your debugger so that after completing it lists:

  • All the variables, their type, and where they were instantiated (function name, line number)

Task 4

While the program is running: Add the following functionality to your debugger:

  • Print the number of times each line has been executed so far

After the program ends, write a report that includes

a) All created variables and where, as well as the ranges and type, and the initial and final value.

b) Total program execution time.

c) The number of times each line has been executed

Task 5

Extend your debugger to add this functionality:

  • If a program name is passed as a argument (path to a .py file) load that file and, by default, debug its main function. If that’s not possible, default to a different one.

As you can see, what we are doing here is separate the work in two phases: Analysis and reporting. The reporting should just need the json file, but not the source code of the program being debugged.

Task 6

Let’s create a test suite for your program. This test suite will execute a number of well known algorithms. From the output of the program and without access to the source code, a reader (human) should be able to tell what the algorithm is doing.

For example, if the algorithm is a quick sort, then by following the output of your program and how the variable change as it runs, it should be obvious to tell that it’s sorting the input.

You don’t need to implement the algorithms yourself — it’s fine to use existing code since what you are writing is the debugger.

Create a test suite that generates the output for these well know problems:

Sorting: https://www.tutorialspoint.com/python_data_structure/python_sorting_algorithms.htm (all algorithms there)

Binary search

Depth first search

Breadth first search

Knapsack problem

Task 7

On the last task we were able to generate a .json file that describes the program flow — the steps, and what happens at each step.

We’re going to use this .json file (and the source code) to generate a video file that shows the program as it runs.

This is your job:

  • Add a option -video to your debugger that takes the source code of the program being debugger, the .json file from the program execution and generates a video file that shows the program flow.

The video must contain two parts: To the left, display the source code, with the line just executed in a different color. Also display the number of times that line has been executed and the time spent on it so far.

To the right, the variables: Display in the different color the last modified variable as well as the current and previous values.

For the video generation itself you can use any library. You may need to use two libraries actually, one to compose each frame (this would be a graphics library that gives you a canvas and lets you add text to it) and another for the video, for example a .mp4 library. But it’s really up to you to come up with a good solution.

As always, upload to github and send us a link to the repo.

Task 8

We left chapter 7 after creating a video. We’re going to improve it by make videos a bit more customizable by the end user. We thought about different ways to do it and we are leaning towards some kind of style sheet (not as in CSS, but similar concept). We suggest yaml but you can come up with something different or better if you think of something. When creating the video, IF that file exists (otherwise just do default) apply it to the output. That file should the user specify:

  • Running speed

Task 9

Add the following features to your program:

  • Allow passing of arguments to the program to be debugged, for example if I have a program called “fibonacci.py” that prints to screen the fibonacci series up to a passed number, then I should be able to pass that number via your debugger.

Task 10

Add (in the style sheet file, or whatever you are calling it) a way to specify different ways to display specific variables. For example, given this simple example:

my_list = [1, 3, 5, 7, 9] for i in range (len (my_list)): print my_list[i]

Clearly i is a pointer into my_list, yet there is no way to tell that from the source code. Come up with a good way to describe that relationship in the style sheet (also: come up with a better name for it) and a way to graphically display it in the video.

A possible idea is to allow pointer variables to be graphically below the element in the array they are pointing to.

Also, add an option to mute specific variables (don’t display them at all).

Following tasks will ask you to add more ways to represent variables in ways that are useful for well known data structures. Keep that in mind when coming up with your architecture!

Task 11

Write a README.md with instructions. Make sure it contains (of course) sample videos. Make sure it mentions GCI 2019 and CCExtractor. Even if this is not a coding task and you’ll probably find it boring, take it seriously — good documentation and an attractive introduction to it can make or break the popularity of software!

Task 12

We’re going to add support to a few super typical data structures — as well as we can (this is not really easy). Let’s see how far we can get.

As you can see, they are quite simple — basically they have a left node, a right node, and a value, for each for the nodes. Typically you always keep a reference to the root node so you can traverse the tree.

Your job is to allow your system to draw the tree either from the root node or from the node a variable currently contains.

After GCI

Sadly, GCI is over — so now it’s time for these amazing small programs to take off on their own.

We are really proud of the students that chose to participate with us and take on this series of tasks. Two of them will win a trip to California to visit Google. They really deserve it for the effort and the passion they demonstrated during these six weeks.

We can’t wait to see what they do next.

Written by

Senior software engineer at Standard Cognition .

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store