I’m coming off a big high from speaking at PyCon Canada earlier this month. I talked to a lot of smart people during the conference, and it seemed everyone was talking about the same hopes and pain-points. This is an attempt to combine the faint whisperings of the community into a single cohesive post.
I love python. I’ve been using Python for almost 10 years now, across projects both personal and professional. My work is equal parts data analytics and rapid prototyping, so Python is a natural fit. The great draw of Python is it has packages for everything: machine learning, data exploration, reproducible research, visualization, cloud functionality, web APIs, and the kitchen sink.
What has kept me coming back is the amazing community who are generally a pleasure to interact with online, on both StackOverflow and GitHub. In 2015, a friend of mine was organizing PyCon Canada at our university, and roped me into volunteering. I found the community was just as friendly in person, as well as incredibly creative (who thinks of using Python to characterize the structure of wood???). This year, I thought I would give back a little, and gave a talk at PyCon in my home town of Toronto. I was blown away at how much the community had grown in just 3 short years. When I was told I would be giving my talk in “the ballroom”, I thought it was just the name of a room. It turned out to be a bit bigger than that.
As before, I found the community filled with smart and creative people. A perfect example of this is the talk A Bossy Sort of Voice: Uncovering gender bias in Harry Potter with Python, which (regardless of your opinion on the subject) is a hilarious and eye-catching title.
However, as with any engineering effort, Python is a work-in-progress. Our perception of the language today is different than it was even five years ago, so things that may have seemed outlandish then are now not only possible, but logical. In this post, I want to lay out what I see as promising directions for the community, and how I would like to see it grow.
Many good things have either landed in 2018 in Pythonland, or have overcome their growing pains. Here are my personal favourites:
A Jupyter Notebook is a web application to execute Python (and other languages) and view the results in-line including graphs, prettified tables, and markdown-formatted prose. It also automatically saves intermediate results (similar to a REPL), allows exporting to many formats, and has a hundred other features. For a deeper dive, see my PyCon talk. Jupyter Notebooks are very widely used in the community, especially those in research and scientific fields. The Jupyter team very justifiably won the 2017 ACM Software System Award.
JupyterLab is an exciting improvement over traditional Jupyter notebooks. It includes some compelling features like cell drag-and-drop, inline viewing of data files (like CSV), a tabbed environment, and a more command-centered interface. It definitely still feels like a beta, with some glitches in Reveal.js slide export functionality and cell collapse not working as expected. But on the whole it’s a perfect example of a good tool getting even better and growing to fit the sophistication of its users.
mypy, a static type checking tool for Python, has existed for a while. However, it has gotten really good this year, to the point where you can integrate it into your production project as part of git hooks or other CI flow. I find it an extremely helpful addition to all codebases, catching the vast majority of my mistakes before I write a single line of test code. It’s not without pitfalls however. There are many cases where you have to make annotations that feel burdensome
__init__(self, *args) -> None
and other behaviour which I view as just strange. The lack of typeshed files for many common modules¹ such as:
continues to be an issue in integrating this into your CI system without significant configuration. The
— ignore-missing-imports option becomes basically mandatory. In the future, I hope that it becomes a community standard to provide typeshed files for all modules intended to be used as libraries.
Pipfile and pipenv
The top-level motivation is that dependency management with
requirements.txt seem to be well-known in the community, the closest article I’ve seen to an enumeration is this post. I recommend a read, but here is a TLDR:
There is no standard for
requirements.txt: is it an enumeration of all primary and secondary dependencies, or just the strict requirements? Does it include pinned versions? Additionally, splitting out development-time requirements is very ad-hoc². Different groups do different things, which makes reproducible builds a problem.
Keeping the list of dependencies up to date required
pip install $package followed by
pip freeze > requirements.txt, which was a really clunky workflow with a ton of problems.
The development-management ecosystem consists of three tools and standards (
requirements.txt) which do not interop cleanly. Since you’re trying to accomplish a single task, why isn’t there a single tool to help?
Pipenv creates a virtualenv automatically, installs and manages dependencies in that virtualenv, and keeps the
While the idea is great, using it is very cumbersome. I’ve run into many issues using it in practice and often have to fall back on the previous way of doing things — using an explicit virtualenv for example. I also found that locking is very slow (a problem partially stemming from the
setup.py standard, which is the source of many other issues in the tooling ecosystem).
f-strings are fantastic! Many others have written about the joy of f-strings, from their natural syntax to the performance improvements they bring. I see no reason to repeat these points, I just want to say it’s an amazing feature that I have been using regularly since they landed.
An annoyance they introduce is the dichotomy between writing
print statements and
logging statements. The
logging module is great, and by default does not format strings if that log message is turned off. So you might write:
x = 3
Which would print
x=3 if the log-level is set to
DEBUG, but would not even perform the string interpolation if the log-level is set higher. This is because
logging.debug is a function, and the strings are passed as arguments. You can see how it works in the very readable C source code. However, this functionality disappears if you write the following:
x = 3
The string interpolation happens regardless of log-level. This makes sense at a language-level, but the practical consequences are irritating in my natural workflow. I write
logging statements later. So each print statement has to be manually rewritten to fit the different type of string interpolation. I don’t have a good idea of how to solve this problem, but I want to point it out as I haven’t seen anyone else write about this particular problem.
As with any project that has been around for as long as Python (wow it’s as old as I am), there are modules and ideas which are showing their age. This is not meant to be a shade-throwing contest, but laying down the gauntlet to say we as a community can do better.
Tox is still the best (or perhaps more accurately the de-facto) test-runner we have in Pythonland, and it’s quite bad. Not only is the syntax for
tox.ini files a bit unintuitive, the tool is also extremely slow. It’s not really tox’s fault, as the whole
setup.py system is broken by design. Because these files declare package dependencies and at the same time can execute code, discovering dependencies is inherently slow. This leads to slowness in a number of tools. I believe this is something we should tackle as a community in 2019.
As an aside, there is still no Pipfile support, which makes the value proposition of using it much lower. As with everything, it’s not just about how good the idea is, but the tooling support around it.
type annotations are for tools only
Quoting from PEP0484:
Using type hints for performance optimizations is left as an exercise for the reader.
This is understandable given the state of Python at the time that the PEP was written, but it’s now time to move on. We have successfully transitioned to Python3, and 359/360 of the most commonly downloaded packages on PyPi are Python3-compliant. Type hints are here to stay, and are well-loved by the community. Moving forward, Python type hints should carry additional benefits such as performance optimization and automatic runtime type assertions. I find runtime type assertions to be both extremely helpful (especially in libraries), and very cumbersome to write manually. With type hints, this is especially annoying as you have to maintain multiple sources of truth for types.
As others have written, Python 4 will probably have JIT as a first-class feature. This seems like a logical place to add performance optimization in response to type annotations.
One of my biggest gripes with Python right now is the lack of
const or its equivalent. Of all the mistakes I make during coding, a solid 90% of them can be traced to either type-related mistakes (now mostly caught with mypy) or accidental reuse of a previous variable within the same function when I thought I was creating a new variable. I understand that there are packages for this, but I want
const to be a first-class citizen.
The nbconvert project is, on the whole, amazing. It allows the conversion of Jupyter notebooks into various other formats including PDF, Reveal.js slides, or an executable script. I have used the last two extensively in the past couple of months, and they have honestly changed my workflow. I can put together a notebook, then at the last moment convert it into a presentation for a weekly meeting with my colleagues to show my progress. Similarly, I can develop an idea in a notebook, then convert it into a script and put it into production with minimal changes.
That’s the idea, anyway. The reality is that the scripts produced from any sizable notebook require so much manual effort to convert that it’s often worth it to write them from scratch using cut-and-paste. I heard from a few companies that they have created wrappers around
nbconvert to make it a bit more wieldy. I encourage these folks to open-source these contributions, if only to alleviate my personal pain.
I was encouraged to write this post by many excellent people, and I appreciate all their feedback and support. If you disagree, feel free to dunk on me on Twitter. If you feel my pain, leave your stories of discontent on the Hacker News discussion thread.
¹ these are just the ones that I have used in the past month. I’m sure there are more complete lists out there, for example the typeshed type-stubs-request tag.