In the Dev 101 series I cover some basic concepts of computer programming for a broad audience. I guess it’s the explanation I was looking for myself, when I first started out as a programmer…
TL-DR: Your code is full of teachable moments. Use them to your benefit.
Running into trouble
Learning the basics of a programming language — especially your first language — is often a mixed experience. On the one hand, you’ll probably feel empowered at the considerable might that now lies at your key-stroking fingertips. Need to harvest a text corpus from a news archive? No problem, you should now be able to throw together a handy web crawler. Want to reorganize your photo library? Easy, just iterate over the images, read the file modification dates and copy to a folder in YYYY-MM format.
On the other hand, you’ll quickly discover that the basics are just the tip of the proverbial iceberg. And the most common way to discover this is by running into trouble. Let’s take our web crawler for instance, which might look something like this:
This code might suit your needs if you run it only once, but chances are you’ll need to improve it. For instance, we should really wrap our URL request in a
try ... except to catch HTTP and URL errors. Moreover, this version ‘only’ visits 1000 pages, but what if you need many more or the downloads are huge? What are the ways to speed this up?
Our photolibrary script, on the other hand, might run like this:
You can try to run this, but I wouldn’t advise it as there is (at least) one major bug here. Think for a second: what would happen, for instance, when the script copies two different files that share the same name and were taken in the same month?
cp London/selfie.jpeg /home/username/Photolibrary/202012/selfie.jpeg
cp Bruges/selfie.jpeg /home/username/Photolibrary/202012/selfie.jpeg
That’s right; now you are overwriting files. How would you solve this?
So for all your initial enthusiasm about being able to “automate the boring stuff with Python”, it is easy (and very normal, by the way!) to run into trouble when you start writing code.
The solution to this is both simple, and at the same time a life’s work: keep learning. If you keep studying, practicing, reading you’ll learn about more advanced concepts and techniques which will help you with any trouble you encounter. For instance, learning about concurrency in Python, like multithreading with the
concurrent.futures module, will enable you to make a much faster crawler. And when you realize the potential of not handling filesystem paths as strings, but as objects with the
pathlib module, you will easily fix the photolibrary script with unique filenames by inserting some kind of index between
However, the question is often: “Where to start?” There is so much information out there, so many books to read, courses to take or coding exercises to do. One way to start is by recognizing the teachable moments in your own coding practice. Below I offer some examples of such moments with Python, but the learning principle applies to all languages.
Although in education, teachable moments are per definition unplanned opportunities, it is nevertheless possible to create a mindset of pattern recognition, as these teachable moments can be generalized in a couple of categories.
The first kind of teachable moments is when you notice you are repeating yourself. For instance, writing a tool for inspection of a legacy code base, I found myself doing this with dictionaries quite often:
As the script grew more complex, I grew tired of checking if an item was already present in my a dict and if not, initiating its default value. It was bloating and obscuring the code too. There had to be a better way. And of course, there was: the
I’m sure you’ve read the advice “don’t repeat yourself” before, but I hope this example shows how to recognize and act upon it as a teachable moment.
There can be only one
One of the maxims of the famous Zen of Python says:
There should be one — and preferably only one — obvious way to do it.
However, in the field this is not always obvious. Indeed, when you first start with Python (and many other programming languages, but not all!) there often appear to be many different ways of doing the same thing. Consider this example of looking for a sub-string in a string:
When you first start programming, you are either not aware of alternative options or you just choose whichever one you are most familiar with. However, whenever you find yourself doing the latter, you should take the opportunity to really consider the options. As always with writing code, there are several factors at play: readability, efficiency, re-usability, and so on.
For instance, studying the different implementations above, you will find:
inoperator is Python’s way to perform membership test operations. It is the most general approach to this problem and will also work with other data types (int in list, bool in tuple, string in dictionary keys, …). Therefore, it is also suited if you want to refactor this functionality to a general function or class.
string.find()is a specific method for a string. It returns the index of the sub-string in string or
-1when the sub-string is not found. This means that there is more going on here than simply checking for membership, and this also explains why this option is bound to be slower:
When you run this in a Jupyter Notebook with the magic function
%timeit you see the difference, which can be considerable in performance-critical contexts:
%timeit check_find()259 µs ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
273 µs ± 66.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The third option, searching with a regular expression, is even slower (even if you do the
import outside of the function). As you can read in the documentation, “regular expression patterns are compiled into a series of bytecodes which are then executed by a matching engine written in C”, which means that there is even more going on under the hood than with
253 µs ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.4 ms ± 65.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In this way, you see how discovering “equivalent” implementations can be another teachable moment to further your understanding of how code really works.
This can’t be right
Sometimes you will find yourself writing code that functions well enough, but still makes your Spidey-sense tingling. For me, this happened when I first started working with multiple conditions. For instance, I found myself concatenating multiple
or operators like so:
if (char == "!" or char == "." or char == "?" or char == ";" or char == "," or char == ":"):
# do something
After doing this for a while and reading about good code being concise, I just knew there had to be a better way. In fact, we’ve already discussed it:
if char in ["!", ".", "?", ";", ",", ":"]:
# do something
Another example of multiple conditions is when one of my students was translating API error codes into exit messages like this:
It really blew their mind when I showed them you could just put all the error codes and messages from the API documentation into a dictionary and then just exit as follows:
So whenever you notice that your code is getting verbose or awkward, trust your instincts and use that moment to learn about a better solution!
How on earth…?
Sometimes you’re not so much looking for a better way to do things as for any way to solve your coding conundrum. At times, the task at hand will feel excruciatingly difficult, even seem impossible.
This is another important teachable moment. Whenever you feel stuck on a problem, it usually means you are at the limits of your knowledge and skills, and should look for new information. This often boils down to shifting perspective or thinking outside the box.
Recently, for instance, I got stuck on trying to come up with a regular expression to detect matching brackets. Something that could do this:
After spending about an hour on the regular expression, I was ready to give up. But then I Googled (actually “DuckDuckGoed”) the problem in depth and found out that this is actually impossible to solve with regular expressions. Instead, you need to resort to parsing techniques — a very simple implementation of which could be:
This was not the finished product (I’ll leave it up to you to consider how to handle brackets within strings like
print(len("(")) !), but it was a huge step forward. This shows how getting stuck should not lead to frustration and despair (believe me, though, I know the feeling), but actually offers an opportunity to learn.
Not too long ago, I needed to batch process some Word files and I found this oneliner to extract the text with the
from docx import Documenttext = ''.join([p.text for p in Document('myfile.docx').paragraphs])
Such clever pieces of code are all well and good, until you need to change something, which is usually the point you realize that you don’t really know what is going on in the line. So in order to find out, I broke the line up into the objects that were created, checked their type and executed
dir() on them. In case you don’t know,
dir() is a function that returns a list of valid attributes for an object, thus offering some insight into its structure and functionality:
The output is:
['_Document__body', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_block_width', '_body', '_element', '_parent', '_part', 'add_heading', 'add_page_break', 'add_paragraph', 'add_picture', 'add_section', 'add_table', 'core_properties', 'element', 'inline_shapes', 'paragraphs', 'part', 'save', 'sections', 'settings', 'styles', 'tables']<class 'list'>
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']<class 'docx.text.paragraph.Paragraph'>
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_element', '_insert_paragraph_before', '_p', '_parent', 'add_run', 'alignment', 'clear', 'insert_paragraph_before', 'paragraph_format', 'part', 'runs', 'style', 'text']<class 'str'>
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
This tells you all you need to know:
Documentobjects have a class member variable
paragraphsis a list containing
Paragraphobjects have a class member variable
textwhich is a string
Personally, I like to inspect variables like this, but you can also heed that other infamous piece of programming advice RTFM and read the
docx manual to find out how to use it.
Reading manuals and documentation is a skill on its own and takes practice. So this is where we find our third pattern of teachable moments. Whenever you find yourself using a function or a method that you are not 100% familiar with, look it up in the documentation. Even when you think you know it well, give it a try. You might be surprised to find quite a few things you weren’t aware of. Take
open() for instance, which you probably consider well-known. Can your really claim to know what all of its documented parameters do?
open(file, mode=’r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
So digging deeper — even in familiar territory — by exploring the documentation is another excellent way to improve.
Let me conclude this article with my personal opinion about what it takes to be a software engineer. I believe there is only one talent you need, and that is being a quick and independent learner.
However, I also believe this is not a matter of You have it or you don’t. It’s not about “raw” talent. All software engineers, whether they are CTOs or rookies, have to put in considerable effort to learn new technologies. The best software engineers grab every chance they get to learn.
And that’s what recognizing teachable moments is all about.