10 Reasons to Prefer Native Tooling to Subprocesses
If you code, you probably use a terminal emulator. Maybe you build docker images with docker build
, explore a database with mysql
or manage cloud infrastructure with aws
or gcloud
.
Sometimes you want to use these utilities in a Python script. This can be done using the subprocess
module. For example,
docker run hello-world
becomes
This works just fine, but there is another way. The Docker SDK for Python achieves the same result with:
And it’s not just Docker:
- Instead of
subprocess.run(["mysql", ...])
, there ismysql-connector-python
; - For
gcloud
there are Google Cloud Platform Python clients; - For
aws
there isboto3
; - For
git
there isgitpython
;
and the list goes on. Modules that allow a task to be completed without using subprocess
may be termed native tooling. In this article, we will look at 10 reasons why native tooling generally beats using subprocess
.
1. Native Datatypes
When you invoke subprocess.run
, a CompletedProcess
object is returned. This can be interrogated to tell you the return code and anything written to stdout
and stderr
.
Native tooling, on the other hand, can return whatever Python object the designer feels is most appropriate. This may be anything from a str
, bool
or dict
to a custom object of arbitrary complexity.
Native tooling can also accept native datatypes as arguments, facilitating the process of handing data over to them in the first place.
2. Dependency Management
For a third party to run your Python application, they will need a compatible version of Python. Once Python is installed, pip
will take care of dependency management.
But pip
only deals with Python packages. If you need the user to install a command line tool that can’t be installed using pip
, such as the Google Cloud CLI, the user will need to manually install it. If the user has the wrong version, your program will still run, but may fail at some unknown point for an unclear reason. Contrast with pip
, which checks version compatibility explicitly.
3. Statefulness
Some native tooling can do smart things by remembering previous commands. An example is database drivers. Using native tooling enables:
- re-use of a single database connection between operations;
- grouping of changes into batches for efficiency.
subprocess
would not reasonably be able to either of these due to the process terminating every time an operation is executed.
4. Reducing Complexity
Any piece of software comprises many layers of abstraction that take us from high level languages like Python to machine code. We do this because, as human beings, there is only so much information we can reason about at once. We do not do this because abstractions are inherently good — quite the opposite.
All non-trivial abstractions are leaky, meaning at least some knowledge of how they work under the hood will at some point be required to maintain a project that depends on them. Using subprocess
often creates additional layers of abstraction that serve no purpose.
For example, whereas the Docker SDK for Python talks directly to the Docker daemon, using subprocess
means the Python application talks to the Docker CLI which in turn talks to the Docker daemon. The former entails one fewer layers of abstraction.
5. Raising Exceptions
When a subprocess
fails it can return a non-zero value and/or print something to stderr
. You can inspect these and work out if something went wrong, but exceptions are preferable for two reasons:
- Errors arising from subprocesses must be explicitly handled for every error scenario or the error may go unnoticed. Exceptions are obvious by default, and thus shine a light on bugs early on.
- Exceptions bubble up through the call stack until a handler is discovered. Such magic is not available when using
subprocess
.
6. The Standard Library
While not guaranteed, it’s possible that software written in Python will make use of the standard library. Many third party modules use the built in logging
module, for instance. Assuming you are using logging
(and you probably should be), you will be able to manage all your logs through one interface.
Another example is asyncio
. If you are using asyncio
for async concurrency, there is a possibility that a third party module will provide out of the box coroutines you can leverage, such as those offered by the GCP Python KMS SDK.
There is no guarantee that native tooling will support your preferred standard library package, but sometimes is better than never.
7. Efficiency
Whether you use subprocesses or native tooling, you will be making essentially the same set of system calls to do whatever it is you are doing. Using native tooling means only one process — Python — is required to make those system calls, whereas using subprocess
entails the overhead of a second process.
8. You Know Python
After running subprocess.run
, what the computer is doing is effectively opaque to you. Some program is running, but exactly the details of how it works are unclear. For the most part that is fine, but from time to time, it is useful to be able to delve into the source code of third party tools.
For example, maybe you want to add a breakpoint()
for debugging, investigate a stack trace, or understand why an API isn’t behaving as expected. Since you know Python anyway, it doesn’t hurt to have that extra level of insight into how your tooling works under the hood.
9. An API
Oftentimes subprocess
is used to invoke other programs via command line interfaces (CLIs). CLIs are a method of human-machine interaction, allowing a user to type commands that the computer actions.
Native tooling, by contrast, was built for consumption by other applications. It exposes an application programming interface (API) that was designed specifically this use case.
For example, CLIs often expect user input, displaying a message like:
Are you sure you want to continue? (y/N)
It is possible to write a program that deals correctly with such human-oriented interfaces (see the Unix yes
program), but it’s easier to just use something that was designed from the ground up for applications.
10. Productivity Tools
When using native tooling, developer productivity tools, such as Rope for autocompletion and mypy for type checking, can discover its interface. With subprocess
, you are on your own.
Conclusion
There is nothing wrong with using subprocess
, and indeed a Python script full of subprocess
invocations still has many advantages over a shell script. But hopefully this article has convinced you that for even moderately complicated workflows, the benefits of switching to native tooling where possible are manifold.