All the things I hate about Python
Python is viewed as a ubiquitous programming language; however, its design limits its potential as a reliable and high performance systems language. Unfortunately, not every developer is aware of its limitations.
Ok, so “hate” is a strong word, but hopefully this click-baits enough folks into reading this article with a clearer, more rational and unbiased demeanor than what the title suggests. Without descending into hyperbole, what I hope to share with you are all the things that in my view make Python a very problematic programming language for building reliable, high-performance distributed systems. Python’s problems are obvious to just about every engineer who writes serious mission-critical, production-grade code in the modern enterprise. My intention is not to throw shade on Python and Python developers, but rather to layout all the considerations to weigh before using Python exclusively for your service-oriented architecture. Below are lessons I learned from maintaining several large Python codebases over the last few years. For programmers approaching Python as a tool-for-all from less intensive software development backgrounds, its pitfalls rarely emerge from under the rug. With that prefaced, let’s pull the rug from under.
Weakly/Dynamically Typed Members
Python is considered a weakly-typed or dynamically-typed programming language. Type safety is an indispensable part of building reliable software in large programs. All developers should familiarize themselves with these terms. I should start off with a simple example:
return "hello world"
In only 5 lines of code, I’ve already written something laden with potential problems. If I am a programmer viewing this code for the first time, my immediate question is “what does
foo() do?”. Maybe there’s a unit test to which I can refer; but if there isn’t (which may sadly be the case), how am I to make sense of this code?
It’s virtually impossible to statically analyze this snippet without looking at the function definitions of
bar(x), and those methods may have references to other methods, and so on and so forth. The first glaring issue is that I have no guarantee that
is_valid() returns a
bool type. The second glaring issue is that I have no idea what
bar(x) returns. Does it return a
str as the first condition suggests? If it doesn’t then what I have is a method that in some cases returns a string and other cases does not. Let’s say
bar(x) returns an
int, then this would mean that
foo(x) could return a string or an integer. We could make things slightly safer if we annotate the code above with type hints (available in Python 3):
def foo(x: str) -> str:
return "hello world"
But these are guard rails are only ever inspected at runtime (unless the code maintainer type check these call sites with linters throughout the CI/CD process). Python evangelists tout this as a feature of the language: that types are dynamic. However, as the complexity of a codebase grows, the developer cost of interpreting dynamically typed members can turn into huge developer productivity bottlenecks and add unnecessary maintenance overhead. Humans must expend time either manually inferring types or writing boilerplate code that validates types at every critical call site.
The example above is only 5 lines of code. Just imagine if you had to do this for 5,000 lines of code! Now multiply this by the number services and service edges in your cluster. In distributed systems, services serialize and deserialize data over networks. Serialization formats — like JSON or Protocol Buffers — can offer varying degrees of type safety. Exchanging JSON payloads over HTTP is roughly equivalent to exchanging
dict objects between Python apps. But for services exchanging more strongly typed payloads using something like Protobufs or Thrift, Python degrades the type-check guards that are built into these serialization formats once they are converted to POPOs (Plain Old Python Objects).
In Python a good amount of errors are discovered at runtime, which is another problematic issue. The fact that I cannot, in the absence of perfect test coverage, verify the type correctness of someone’s code through visual inspection is a big red flag. This is one of the reasons why compilers were invented: a strongly-typed programming language will perform all the type checks necessary to root out things that should just never happen at runtime. Type-checking compilers offer an extra level of security and defense to the reliability of your code, and reduces the risk of software errors at deploy-time. Twitter’s migration from Ruby to Scala is one of the most renowned examples that highlight the value and importance of this. As they share from their experience, the absence of type safety in Python (much like Ruby) should be a concern for anyone writing serious systems code. To improve static type checking in Python, developers made libraries like
toasted-marshmallow, but they are often not used in most projects because they do not belong to the standard toolchain or because it often requires some significant effort to invest time updating legacy code with boilerplate annotations if type safety gets treated as an afterthought.
The Global Interpreter Lock
Modern software is multithreaded and multicore, and so just about every modern programming language offers concurrency primitives. Python ships with a
threading module; but due to how the Python interpreter is implemented, it can only ever run one thread at a time. This is due to the infamous Global Interpreter Lock (GIL). In an age when the only computers around had a single core, this was nothing to fuss over. But in today’s world where computers are built with multiple processors, the GIL introduces contentious behavior between processors for CPU-bound tasks and thus imposes a frustrating constraint on concurrent programs that are compute-heavy.
Consider the code below (which I borrowed from a magnificent talk by David Beazly on the GIL):
while n > 0:
n -= 1
COUNT = 80000000
start = time.time()
end = time.time()
All I’m doing is counting down by 1 from 80 million until I reach zero and timing the duration it took to do so. On my laptop this took about
3.46 seconds. But my laptop has multiple cores, maybe I can put them to greater use by splitting the work. Naïvely, we would expect the implementation below to perform faster:
from threading import Thread
while n > 0:
n -= 1
COUNT = 80000000
t1 = Thread(target=countdown, args=(COUNT/2,))
t2 = Thread(target=countdown, args=(COUNT/2,))
start = time.time()
end = time.time()
Here I’m splitting my countdown work in half between two threads in hopes of achieving a faster runtime. Since each thread is CPU-bound, they get scheduled on different cores and then contend over acquiring and releasing the GIL. The cost of this “GIL battle” adds up as the elapsed time to run this code on my computer is approximately
5.87 seconds, almost twice as slow! Bizarre stuff, but not so bizarre once you understand Python’s concurrency model.
The pitfalls introduced by the GIL are well known (read “Python’s Hardest Problem” Part 1 and Part 2) and come with real performance implications.
Python is relatively slow compared to programming languages that run closer to the operating system. The run time of the countdown example above is orders of magnitude faster when implemented in other language runtimes. As shown above, multithreaded concurrency is a weakness of Python for CPU-bound tasks, and so is parallel processing in similar tests. Python’s raw speed when it comes to performing a variety of computational tasks or I/O-bound tasks is not stellar as demonstrated by several standard-bearing benchmarks. Python also suffers from a very high function invocation overhead. There are implementations such as PyPy that are fundamentally faster than the CPython reference implementation, but do not offer the same breadth of libraries already supported by the CPython ecosystem.
The performance limitations of Python make it ill-suited for real-time processing or stream-based systems that move or manipulate large volumes of data across several machines with low-latency — a virtually tautological fact for engineers building high-speed distributed systems.
The Great Schism: Python2 vs Python3
If you’re doing development on Mac or Ubuntu, you likely already have Python 2.7 installed (Windows normally does not come with Python pre-packaged). There’s a good chance that when you first started programming in Python, no one told you about this thing called Python 3 (no one told me at least). And so if you were moseying along obliviously in the world of Python 2, it may have come as a surprise to you that some Python programs work fine on your colleagues’ machines but on your machine a bunch of syntax errors appear:
$ python main.py
File "main.py", line 6
yield from generator2()
SyntaxError: invalid syntax
And you might have also been further dumbfounded as to why programs that run just fine on your machine did not work on your colleagues’ machines:
$ python main.py
File "main.py", line 11
print "running script"
SyntaxError: Missing parentheses in call to 'print'
After some head scratching — and maybe after a bit of time banging your head on your desk — you discovered that the version of
python installed on your computer links to a Python 2 interpreter and the version of
python installed on your colleague’s computer links to a Python 3 interpreter. Upon this discovery, you inform Colleague A that your code only works on Python 2. Your colleague’s reply is “Upgrade to Python 3.” So you head back to your desk, and just as you’re about to install Python 3. Colleague B, who has more detailed knowledge of the codebase to which you are contributing, overhears this conversation and tells you “It’s OK, the code is only suppose to work on Python 2 anyways”. But wait a minute, if you stick to just Python 2 how are you suppose to run the code Colleague A gave you to test that only works on Python 3. Get ready to feel the squeeze!
If you work in a small organization, it may be relatively easy to standardize on a version of Python that is tailored to the codebase developed and maintained by that organization. If you work in a large organization, there is almost certainly two camps of developers split along a Python 2 and Python 3 divide. In a large organization with a history of bifurcated development, it is virtually impossible to close the divide. This divide has been widening for years — a divide that I’d like to call “The Great Schism.”
The Schism began in 2008 with the release of Python 3. Many thought this would be the future of Python. In actuality this was not the case. Several legacy projects had been developed on Python 2. I myself worked on a codebase for three years that only worked on Python 2.4. Migrating to Python 3 or even 2.7 was impossible due to vendor-specific libraries we were using. These pain points are not unique. Over time, the wider developer community has created tools like
3to2 , and
six — which were all well-intentioned gestures — to improve syntax compatibility across libraries, but were not perfect solutions as there is always some code that cannot be converted.
Thus Python has limitations on writing clean backwards and forwards compatible code. These limitations are no light matter, especially for teams who own mission-critical software. In 2014 at a talk hosted in Uber’s San Francisco headquarters, Kenneth Reitz went as far as to say:
“I am very very afraid for the future of Python. And I think that we should all be very afraid. I am absolutely terrified, to be honest.”
Hearing these words on the state of division from not only a well-respected Heroku engineer, but a core member of the Python Software Foundation (PSF) should be cause for alarm. For years the community has recommended Python 3 as the favorable choice for doing any modern Python development, but not everyone is heeding this call. The PSF is doing everything it can to move developers to Python 3. In fact, it has issued a moratorium on Python 2 as it will no longer be supported by 2020. Given the nature of this divide, Python is far from Zen.
Encapsulation and Packaging
Class abstractions are leaky in Python. Anyone can access code from almost any place. Classes may have members that are unsafe when executed outside their class definition. By convention, members with a single leading underscore are considered “protected,” and members with double leading underscores are considered “private”:
However, Python does not have a real privacy model. Instead the interpreter resorts to mangling to protect namespaces when using inheritance, but still allows access to mangled members:
>>> f = Foo()
This is unsettling if you’re coming from a Java or C/C++ world and are expecting access modifiers. They simply do not exist in Python. There are no mechanisms that facilitate encapsulation of data, data access, and the methods that act on that data.
Python packages are also an awkward thing. Every folder needs to have an
__init__.py file — even if it’s empty — for it to be an importable package (this is not needed in Python 3). Typically only package-wide definitions should go in that file. Core logic should not be placed in these
__init__.py files. A Python project may be structured as follows:
│ ├── __init__.py
│ ├── abc
│ │ ├── __init__.py
│ │ └── seq.py
│ ├── util.py
│ └── xyz
│ ├── __init__.py
│ └── par.py
│ ├── __init__.py
│ └── core.py
If the file has a different name than its package directory (e.g.
util.py and its package
bar), import statements inside that file use absolute paths rooted in the top-level package. However, if you have a file in a package directory of the same name (e.g. rename
bar.py), one must use relative paths in the import statements of that file, otherwise
ImportError exceptions will end up in the stack trace next time
main.py executes. And since these kind of exceptions are only detectable at runtime, developers must manually track and resolve import statements without the help of any compiler tools. Inconsistent conventions are never fun to bookkeep.
Disorganized Build System
Building a Python project and importing project dependencies can be messy. If you’re doing serious software development in Python, it is almost always messy. Consider the
numpy library, a popular numerical programming module that enchants developers searching for the look and feel of MATLAB, Octave, or R. Let’s say you want to install this on your laptop (running Mac OS X because that’s what I use), except that when you try to install it you get this error:
$ pip install numpy
> error: could not create '/Library/Python/2.7/site-packages/numpy': Permission denied
What almost no one tells you is that you should never use the system-installed Python for development purposes. Stay clear of any
sudo pip commands. Chances are you, just like me, never heard that bit of advice from anyone until much later in your foray with Python. If you’re savvy you’re using a non-system version of Python installed with
pyenv. But then the question becomes —in the face of over 300 versions of Python from which to chose — which version of Python should you install?
$ pyenv install --list | wc -l
I’ll defer that question to your team, and assume you’ve made the right choice. On your Mac, you unlink the
python command from the system-installed version to the newly installed developer-managed Python version, and without any trouble get
numpy installed. You do all sorts of great development work on your project for weeks, and now it’s time to deploy your app. You hand it over to your DevOps or SRE team, but they come back to you with some questions about a build failure that reads:
RuntimeError: Broken toolchain: cannot link a simple C program
And that’s just the TL;DR. The entire build failure log actually looks like this.
There is no short answer as to what is fully going on here, but simply put, Python modules are packaged either as a Wheel or an Egg. The
numpy module is a wheel, i.e. it is a source distribution that depends on a handful of system libraries —
atlas — to compile properly. But in order to compile this library and all its dependencies, the machine on which your app is hosted or built relies on a C compiler like
clang installed to get
numpy working. Hopefully, you’re using something like Docker to build these dependencies into a container image. If you’re using a Debian image you need to do something like this:
FROM ubuntu:latestRUN apt-get update && \
apt-get install -y \
python-pip && \
pip install --upgrade pip &&\
pip install numpy
Or if you’re using the Alpine image:
FROM python:2.7-alpine3.6RUN apk add --update --no-cache build-base && pip install numpy
The end product after successfully building these images is a bunch of bloated artifacts that are 4x larger than the base container image whose resulting sizes are on the order of hundreds of compressed MB!
The bloat and hairiness of installing Python packages is appalling — just to do a simple one line
import numpy statement requires
285 MB to
529 MB of supporting code! Yikes. Disk space is cheap these days, and there are things far worse in the world than bloated container images. However, the fact of the matter is that building Python projects on modern deployment infrastructure can be a very disorganized and unclean process from the perspective of continuous integration and continuous delivery. The Python dependency installation tools will place files and linkers scattered throughout your file system. The above example shows that — even with good isolation tools like
virtualenv or Docker— there are all sorts of gotchas and forks in the road that can result in inefficiencies of both time and space. And this was all just to get one library working. It doesn’t take much imagination to extrapolate this ugliness to Python projects with several library dependencies.
Rather, what I hope to have demonstrated in one place is that — given the kinds of complexities that can easily emerge in enterprises running mission-critical distributed systems in production — choosing Python as a sanctioned language de jure for your team comes with all kinds of considerable tradeoffs. Eventually the forces of entropy will take their toll; and depending on the circumstances, the business outcome will either be frustrated or happy people — be they developers or customers.
Thank you to my colleagues — Bianca Tamayo, Ellinor Crux, George Wu, Joel Gibson, and Tom Paoletti — for their generous feedback when I drafted this write-up.