A template for Python projects

Gabriele Angeletti
Analytics Vidhya
Published in
7 min readDec 26, 2019
The lunch I had last week. Completely unrelated.

I am a lazy person. Every time I find myself doing the same thing more than twice, I automate it. It requires effort at first, but it is worth it in the long run. Starting a new Python project is one of those things, and today I wanna share with you my blueprint for it. You can find the full template on github. This setup is a good starting point for a small to medium-sized codebase and it does the common stuff:

  • Set up the development environment
  • Manage dependencies
  • Format your code
  • Run linting, static type-checking and unit-testing

In the next sections I will describe how these things are set up in the template. Note that a few things are missing, which I’m planning to add next:

  • A deployment script
  • A continuous integration pipeline

Managing python versions — Pyenv

Managing multiple versions of python, or of any language for that matter, is a painful experience, for many reasons: the system version that you can’t touch, the 2 vs 3 nightmares, two different projects that require different interpreters and so on. Pyenv solves this problem: it is a version management tool that makes your life easier in a lot of ways. If you come from the Javascript / Node world, the tool is similar to the popular n.

With pyenv you can easily:

  • Install a new version: pyenv install X.Y.Z
  • Set a version as global: pyenv global X.Y.Z
  • Set a version for the current shell by overriding the PYENV_VERSION environment variable
  • Set an application-specific version by creating a .python-version file

Managing dependencies — Pipfile

If dealing with versions is painful, dealing with dependencies is even worse. Any non-trivial application depends on external packages, which in turn depend on other packages, and ensuring everyone gets the same versions can be rather challenging. In the python world, dependencies have been traditionally managed through the requirements.txt file. It contains the packages your app needs, optionally with the required versions. The problem is that this file doesn't handle recursive dependencies, that is the dependencies of your app's dependencies. Pipfile is a new specification that aims to solve this. It has many advantages over requirements. The biggest one by far is deterministic builds. Pipfile and its partner Pipfile.lock contain all the information needed to install the same exact environment anywhere.

Let’s make an example. Consider the following scenario: our application Ninja ducks depends on version 1.2.3 of package ninja, which in turn depends on another package called requests.

Example — requirements.txt

A requirements.txt file would look like this:

When running pip install -r requirements.txt, we install version 1.2.3 of ninja, because that's what the requirements say, and version 2.7.9 of requests, because that was the latest public version at the time. A couple of weeks later we deploy the application, but in the meantime requests was upgraded to 3.0.0. If ninja was using a feature from requests that has been changed or removed, our application will crash. We could fix this problem by adding requests to the requirements file, but you can see for yourself that this solution doesn't really scale.

Example — Pipfile

A Pipfile instead would look something like this:

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
ninja = {version = "==1.2.3"}

From this we can run pipenv lock to generate a Pipfile.lock:

{ "_meta":
{
"hash": {
"sha256": "[long string]"
},
"pipfile-spec": 6,
"sources": [{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}]
},
"default": {
"ninja": {
"hashes": [
"[long string]",
"[long string]"
],
"version": "==1.2.3"
},
"requests": {
"hashes": [
"[long string]",
"[long string]"
],
"version": "==2.7.9"
}
}
}
}

As you can see, requests is there, even if we didn't mention it anywhere in our Pipfile. That's because Pipfile handles recursive dependencies through the Pipfile.lock file. During deploy, when we run pipenv install --deploy to install dependencies, the correct version of requests will be installed, regardless of the latest version available in the public registry.

  • Note 1: in the above I used a couple of pipenv commands, which is the reference implementation for the Pipfile specification
  • Note 2: you need to add both Pipfile and Pipfile.lock to your repository, otherwise you will not be able to restore the same environment
  • Note 3: if you are currently using requirements.txt and want to migrate to Pipfile, here's an handy guide on how to do it

In the template, both pyenv and pipenv can be installed through the ./setup.sh script provided. It only supports Linux and MacOS (some packages need to be installed manually on Linux).

Managing code — my favourite tools

Here’s a list, in no specific order, of the code quality tools I always use in my Python projects.

Formatting — Black

According to this book I have recently read, willpower is a limited resource. It is like a muscle, you can’t just stay focused for an entire day and expect the same level of productivity all along . That’s why when programming, I want to use my time thinking on the important stuff, not on indentation, brackets, and so on. Everything that can be automated must be automated. I can see at least two major benefits of code formatting:

  • You cede control over formatting rules to the tool, which means you stop thinking about it
  • Since everyone is onboard, you stop discussing with your team whether the perfect line length should be 42, 79 or 110 (or at least you have just one big discussion at the beginning)

Black refers to itself as “the uncompromising Python code formatter”, and it is my favourite formatting tool. It is super simple to use, just run:

black {source directory}

Black has a lot of configurable options. The only one I use is a line length of 110. If you check the full project template, I have included a handy ./format_code.sh script that will format your code in a single command.

Linting — Flake8

Linting is a rather basic code quality check that helps prevent simple bugs in your code. Things like typos, bad formatting, unused variables and so on. To me linting is super useful because:

  • You don’t have to check for minor details, hence you save time
  • Other developers don’t have to check for minor details, hence they save time

I use flake8 for linting. One feature I particularly like is the ability to ignore specific warnings and errors. For instance, I use a line length of 110 which is against the PEP8 style guide (79 is recommended). By turning off the corresponding error E501 I can safely use flake8 with any desired line length.

In the project template, you can run flake8 against your package with ./test.sh lint.

Type checking — Mypy

This is my favourite by far. This tool brings static type checking to the python world. I’ve never written a single line of untyped python since I’ve discovered mypy. I'm not gonna go into the details of static type checking, I'll just show you a simple example stolen from mypy's website:

Standard python:

def fibonacci(n):
a, b = 0, 1
while a < n:
yield a
a, b = b, a+b

Typed python:

def fibonacci(n: int) -> Iterator[int]:
a, b = 0, 1
while a < n:
yield a
a, b = b, a+b

This is super useful because:

  • I’m much more likely to understand what a function does by looking at its signature
  • I can spot lots of errors before even running the code
  • I can check whether I’m correctly using a third-party library

In the project template, you can run mypy against your package with ./test.sh type_check.

Testing — Pytest

Pytest is the best testing framework for python. It gives you detailed info on why your tests are failing, can auto-discover your tests based on their name, has an amazing support for fixtures, and a lot of useful plugins. Writing tests is super easy with pytest. Consider the following module my_module.py:

def my_func(x: int) -> int:
return x ** 2

To test this function, we create a module named my_module_test.py:

from . import my_moduledef test_my_func():
expected = 9
actual = my_module.my_func(3)
assert actual == expected

The main features I use from pytest, in random order, are:

  • pytest-cov: plugin that generates test coverage reports for your code, in a variety of formats
  • pytest-mock: plugin that adds a fixture for monkey-patching. Usage:
def test_my_func(mocker):
mocker.patch(...)
  • pytest-xdist: run unit-tests in parallel. Especially useful for large codebases
  • pytest’s marking feature. You can label tests by using a decorator:
import pytest@pytest.mark.integration
def test_my_integration_test():
...

Then you can run only the tests labeled as integration.

In the project template, you can run pytest against your package with ./test.sh unit_tests.

Others

Other dev. tools that I use in my projects are:

  • isort: automatically sorts imports and separates them into sections: internal, first-party, third-party etc. Again, I'm all about automation, and this tool removes another thing from my mind
  • vulture tool to check for dead code, like unused functions, unused constants etc. It's nice to keep your house clean, especially if you know about the broken windows theory

Please let me know your opinion in the comments below. The full code can be found here (instructions are in the readme).

Originally published at https://gabrieleangeletti.github.io.

--

--

Gabriele Angeletti
Analytics Vidhya

Software engineer, diver, surfer. Mostly writing about programming, data science, and my love for the ocean. Also addicted to Bob Dylan and coffee.