Tools to manage large-scale Python projects

plapadoo
plapadoo
Published in
3 min readSep 11, 2019

Python is a perfect language for small scripts. In our experience writing reaktron, it doesn’t scale that well to bigger programs. In this article, we present the tools we use to ensure your Python code is…

  • refactoring-safe: a method rename shouldn’t cause you to manually revisit the whole code base
  • easier to understand: in large code-bases, what a method does isn’t always apparent from its name
  • easier to maintain with git. We’ll explain that one below.

mypy

Everyone knows Python is a dynamically typed programming language. Not everyone knows Python actually has the ability to specify and enforce types! mypy does just that. How does that help? Well, consider:

class Foo:
def get_id(self):
return "foo"
foo = Foo()
print(foo.get_d() + 3)

Note that we have a typo in get_d and we cannot add a string to a number.

To use mypy, for best results, you have to add types. In the case above, we just have to add a type for the get_id function:

class Foo:
def get_id(self) -> str: # note the new type
return "foo"
foo = Foo()
foo.get_d() + 3

Running mypy on the file gives:

"Foo" has no attribute "get_d"; maybe "get_id"?

And if we correct the code to

foo.get_id() + 3

we get:

Unsupported operand types for + ("str" and "int")

So there’s lots of trivial bugs to be found this way.

pylint

pylint is a tool that helps validate your Python code against a myriad of known sources of bugs. For example, say that you forgot to add self to your method:

class Foo:
def get_id():
return self.id

pylint complains:

Method has no argument
Method could be a function

Or if you misspell the file mode when opening a file:

f = open("test.txt", "x")

pylint tells you:

"x" is not a valid mode for open

black

Python uses whitespace for indentation, so the “where do I put the angle brackets” war that other languages fight is not a problem. However, there is still more than one way to format Python. Do you put spaces around operators, as in 1 + 2 + 3 or do you prefer 1+2+3? Where do you break function calls with lots of arguments? Like this?

f(1,2,3
4,5)

…or like this:

f(1,
2,
3,
4
5)

To mitigate commits that inadvertently introduce (just?) stylistic changes, we format our source code using black before committing. black isn’t configurable with respect to code style, it just enforces one specific style. This way, we ensure the same style choices everywhere and mitigate commits that inadvertently bring in just stylistic changes.

reorder-python-imports

There’s an aspect of code style that black doesn’t cover: import statements. And when you have a Python project with more than one git branch, these statements are a nightmare. Consider a statement like:

from typing import Optional

If two people on different branches add something to that, say:

# branch a
from typing import Optional, List
# branch b
from typing import List, Optional

That’s a completely unnecessary merge conflict coming up!

reorder-python-imports solves this with a few tricks:

  • expanding imports so there is one import for each class/function/… (in other words, there are no commas in an import line)
  • sorting imports
  • removing duplicates

This has the disadvantage of adding more lines of code at the top of the file, but it solves many of the problems you encounter using imports on many branches.

Miscellaneous

Besides that, we have a few helper scripts which alleviate some minor problems, notably:

  • a script to check if somebody used breakpoint or ipdb.set_trace()
  • a script to check if every directory has an __init__ file

Conclusion

When starting the project, we actually had none of these tools installed. We added them one by one because we noticed certain problems (like said merge conflicts in import statements). Writing this, we hope to give you a better starting point than we had with Python and large projects.

--

--