10 Reasons to Prefer Native Tooling to Subprocesses

Published in

FullStackAI

5 min readOct 16, 2020

If you code, you probably use a terminal emulator. Maybe you build docker images with docker build, explore a database with mysql or manage cloud infrastructure with aws or gcloud.

Sometimes you want to use these utilities in a Python script. This can be done using the subprocess module. For example,

docker run hello-world

becomes

This works just fine, but there is another way. The Docker SDK for Python achieves the same result with:

And it’s not just Docker:

Instead of subprocess.run(["mysql", ...]), there is mysql-connector-python;
For gcloud there are Google Cloud Platform Python clients;
For aws there is boto3;
For git there is gitpython;

and the list goes on. Modules that allow a task to be completed without using subprocess may be termed native tooling. In this article, we will look at 10 reasons why native tooling generally beats using subprocess.

1. Native Datatypes

When you invoke subprocess.run, a CompletedProcess object is returned. This can be interrogated to tell you the return code and anything written to stdout and stderr.

Native tooling, on the other hand, can return whatever Python object the designer feels is most appropriate. This may be anything from a str, bool or dict to a custom object of arbitrary complexity.

Native tooling can also accept native datatypes as arguments, facilitating the process of handing data over to them in the first place.

2. Dependency Management

For a third party to run your Python application, they will need a compatible version of Python. Once Python is installed, pip will take care of dependency management.

But pip only deals with Python packages. If you need the user to install a command line tool that can’t be installed using pip, such as the Google Cloud CLI, the user will need to manually install it. If the user has the wrong version, your program will still run, but may fail at some unknown point for an unclear reason. Contrast with pip, which checks version compatibility explicitly.

3. Statefulness

Some native tooling can do smart things by remembering previous commands. An example is database drivers. Using native tooling enables:

re-use of a single database connection between operations;
grouping of changes into batches for efficiency.

subprocess would not reasonably be able to either of these due to the process terminating every time an operation is executed.

4. Reducing Complexity

Any piece of software comprises many layers of abstraction that take us from high level languages like Python to machine code. We do this because, as human beings, there is only so much information we can reason about at once. We do not do this because abstractions are inherently good — quite the opposite.

All non-trivial abstractions are leaky, meaning at least some knowledge of how they work under the hood will at some point be required to maintain a project that depends on them. Using subprocess often creates additional layers of abstraction that serve no purpose.

For example, whereas the Docker SDK for Python talks directly to the Docker daemon, using subprocess means the Python application talks to the Docker CLI which in turn talks to the Docker daemon. The former entails one fewer layers of abstraction.

5. Raising Exceptions

When a subprocess fails it can return a non-zero value and/or print something to stderr. You can inspect these and work out if something went wrong, but exceptions are preferable for two reasons:

Errors arising from subprocesses must be explicitly handled for every error scenario or the error may go unnoticed. Exceptions are obvious by default, and thus shine a light on bugs early on.
Exceptions bubble up through the call stack until a handler is discovered. Such magic is not available when using subprocess.

6. The Standard Library

While not guaranteed, it’s possible that software written in Python will make use of the standard library. Many third party modules use the built in logging module, for instance. Assuming you are using logging (and you probably should be), you will be able to manage all your logs through one interface.

Another example is asyncio. If you are using asyncio for async concurrency, there is a possibility that a third party module will provide out of the box coroutines you can leverage, such as those offered by the GCP Python KMS SDK.

There is no guarantee that native tooling will support your preferred standard library package, but sometimes is better than never.

7. Efficiency

Whether you use subprocesses or native tooling, you will be making essentially the same set of system calls to do whatever it is you are doing. Using native tooling means only one process — Python — is required to make those system calls, whereas using subprocess entails the overhead of a second process.

8. You Know Python

After running subprocess.run, what the computer is doing is effectively opaque to you. Some program is running, but exactly the details of how it works are unclear. For the most part that is fine, but from time to time, it is useful to be able to delve into the source code of third party tools.

For example, maybe you want to add a breakpoint() for debugging, investigate a stack trace, or understand why an API isn’t behaving as expected. Since you know Python anyway, it doesn’t hurt to have that extra level of insight into how your tooling works under the hood.

9. An API

Oftentimes subprocess is used to invoke other programs via command line interfaces (CLIs). CLIs are a method of human-machine interaction, allowing a user to type commands that the computer actions.

Native tooling, by contrast, was built for consumption by other applications. It exposes an application programming interface (API) that was designed specifically this use case.

For example, CLIs often expect user input, displaying a message like:

Are you sure you want to continue? (y/N)

It is possible to write a program that deals correctly with such human-oriented interfaces (see the Unix yes program), but it’s easier to just use something that was designed from the ground up for applications.

10. Productivity Tools

When using native tooling, developer productivity tools, such as Rope for autocompletion and mypy for type checking, can discover its interface. With subprocess, you are on your own.

Conclusion

There is nothing wrong with using subprocess, and indeed a Python script full of subprocess invocations still has many advantages over a shell script. But hopefully this article has convinced you that for even moderately complicated workflows, the benefits of switching to native tooling where possible are manifold.