Analytics Vidhya
Published in

Analytics Vidhya

How to automate Python code formatting

Ello ello ello my fellow engineers!

I wanted to discuss Python Protocol.

Firstly, we will discuss a few format concepts and reasons for why we should follow a Python protocol. Then I’ll show you three tools that you can use to automate the whole process! We’ll cover the libraries Black, YAPF and iSort. Sound good? Let us begin…

If you are more of an auditory learner I’ve made a video on the basics of this discussion and the tools for automation here: https://youtu.be/q2d5vrw1Lss

What is best practice?

Is it just following PEP 8 style guide?

PEP 8 is code convention that is a living document that evolves with changes to Python (see https://www.python.org/dev/peps/pep-0008/).

Or is it following the Zen of Python, which is a set of principles for coding in Python (see https://www.python.org/dev/peps/pep-0020/).

Neither of these enhancement proposals covers everything, but by design they are pretty flexible.

Should you be opinionated on this?

Should you be allowed any personal preference when coding in Python? Is there even a right or wrong way to code, as long as the code runs as expected and there is test coverage? Do you place value on efficiency of code over aesthetics of readability?

So many questions you could debate this until Python becomes legacy.

My team and I were discussing the way we code and some of the inconsistencies that crop up across our code bases. Each of us brought up what we like to see in the format of our code.

The one thing we all agreed on was consistency. From functions and classes to the way we write our tests. As long as we are all on the same page then we can start to focus on the more important things.

Pros to Python Protocol

  • Code Review: When you review code you do not want your focus to be drawn to stylistic comments rather than focussing on the feature being created and the coverage of your tests.
  • Consistency: Having a consistent style across codebases makes your code more readable. Our human intuition is to look for patterns. Therefore, a consistent codebase will allow developers (especially new to the codebase) to pick up the essence of the code faster.
  • Focus: Once you have a protocol in place, it eventually becomes second nature. Then you can start to focus on the real problems and the functions you are creating.
  • Aesthetics: Consistent codebases are nice to look at!

What do we do?

As a base we use PEP 8 and PEP 20, and if you are linting you will need to conform to PEP8 anyway.

Rules: These are ever changing such is the way with coding. But we have set some basic rules to adhere to:

  • Module docstrings contain a detail description of the module and have example uses cases where needed.
  • Imports follow PEP 8 with standard modules first, followed by third party and then application level modules.
  • Multiple imports from a single dependency are wrapped in parentheses and add a comma at the end. The comma is so GitHub recognises any new additions as single line changes rather than removal and replacement.
  • Global variables are in all caps and variables are descriptive of the data they contain, as are all variables.
  • Single exceptions for a module are stored in the same file as the code. However, if there are several, we create a new file for exceptions and import them (to avoid cluttering).
  • Function names should be as descriptive as possible.
  • We currently use Python 3.7, so for our functions we use type hinting for variables and expected return.
  • Complex functions with multiple variables may need a docstring.
  • Functions with exception handling must log the exception and have a docstring explanation of the handling.
  • Dictionaries should have a key and value on each line with a comma at the end.
  • For List creation we try to remain Pythonic (i.e. List comprehension) but if it doesn’t fit on a single line, then we use a loop.
  • For logging we use f-strings (it looks a bit nicer).

Below is an example of the above rules written in code:

"""
Module docstring describing what the module does
"""
import logging
import os
import sys
import pandas as pdfrom file_to_be_imported import descriptive_function_1
from files_to_be_imported import (
descriptive_function_2,
descriptive_function_3,
descriptive_function_4,
)
GLOBAL_VARIABLE = "global variable"
class SomeException(Exception):
"""this is an exception and needs a docstring"""
pass
def single_variable(var: str):
descriptive_function_1(var)
def multiple_variables(
descriptive_var_1: str,
descriptive_var_2: int,
descriptive_var_3: bool,
):
try:
descriptive_function_2(descriptive_var_1)
descriptive_function_3(descriptive_var_2)
descriptive_function_4(descriptive_var_3)
except SomeException:
logging.exception("Some Exception occurred")
descriptive_dict_example = {
"key1": "value1",
"key2": "value2",
}
descriptive_list_example = [x for x in range(0, 10)] logging.info(f"{descriptive_dict_example} and {descriptive_list_example}")

DEMO TIME

Automate formatting!

As software engineers once we have solved a recurring problem we should automate it. I’ve come across two libraries that you may find useful:

Black:

This is a simple to use module, it has very little configuration. They like to say “we are opinionated, so you don’t have to be”

https://black.readthedocs.io/en/stable/

They have an online formatting tool you can use to see the results of Black formatting. To use it with Pipenv we have to install it using pip, for some reason Pipenv doesn’t pick this Library up directly. If anyone could tell me why, I’d love to know:

pipenv run pip install black

Below I have created a file that I would like to format:

import os, sysfrom file_to_be_imported import descriptive_function_1class SomeException(Exception):def single_variable(var: str):def multiple_variables(descriptive_var_1: str,descriptive_list_example = [
import logging import pandas as pd
from files_to_be_imported import descriptive_function_2, descriptive_function_3, descriptive_function_4 GLOBAL_VARIABLE = "global variable"
"""this is an exception and needs a docstring"""
pass
descriptive_function_1(var)
descriptive_var_2: int,
descriptive_var_3: bool,
):
x for x in range(0, 10)
]
"""
This is a more complex function so this
Params are not needed if they are descriptive enough unless they are dataframes. Dataframes should be described.
SomeException: The cases for Exceptions should always be described
"""
try:
descriptive_function_2(descriptive_var_1,)
descriptive_function_3(descriptive_var_2)
descriptive_function_4(descriptive_var_3)
except SomeException:
logging.exception("Some Exception occurred") descriptive_dict_example = {"key1": "value1","key2": "value2",} logging.info(f"{descriptive_dict_example} and {descriptive_list_example}")

(a keen eye will also notice that there is no newline at the end of the file)

Then to run Black:

pipenv run black <path to file>

So in our case:

pipenv run black file_to_be_formatted.py

The result is below:

file_to_be_formatted.pyimport os, sys
import logging
import pandas as pdfrom file_to_be_imported import descriptive_function_1
from files_to_be_imported import (
descriptive_function_2,
descriptive_function_3,
descriptive_function_4,
)
GLOBAL_VARIABLE = "global variable"
class SomeException(Exception):
"""this is an exception and needs a docstring"""
pass
def single_variable(var: str):
descriptive_function_1(var)
def multiple_variables(
descriptive_var_1: str, descriptive_var_2: int, descriptive_var_3: bool,
):
try:
descriptive_function_2(descriptive_var_1,)
descriptive_function_3(descriptive_var_2)
descriptive_function_4(descriptive_var_3)
except SomeException:
logging.exception("Some Exception occurred")
descriptive_dict_example = {
"key1": "value1",
"key2": "value2",
}
descriptive_list_example = [x for x in range(0, 10)] logging.info(f"{descriptive_dict_example} and {descriptive_list_example}")

A couple of points to note here:

  • It hasn’t quite covered PEP8 imports standard. Though it has parenthesised the multiple imports.
  • It has added a new line after our single line docstring.
  • It has listed all the variables of a function on one line.
  • Dictionary key-values are on individual lines.
  • List comprehension is on one line as well.

All in all it has done a pretty decent job, and if you really do not want to be opinionated on formatting perhaps this is the formatter for you.

YAPF:

Spawned in the depths of Google, YAPF is a highly configurable formatter. It uses things called Knobs (which if you are British sound hilarious) in order to configure the Python format:

They are set in a file called .style.yapf see below:

[style]
based_on_style = pep8
dedent_closing_brackets = true
each_dict_entry_on_separate_line = true

The based_on_style needs to be set: there is pep8 , google , facebook and yapf . Then you can add your knobs to configure the format style you want. You can also just use the based_on_style on its own. In the above I have set brackets to dedent and have each dictionary entry on a separate line.

You can then run the yapf with the following:

pipenv install -d yapf pipenv run yapf -i file_to_be_formatted.py

Using the previous file we end up with the result:

import os, sys
import logging
import pandas as pdfrom file_to_be_imported import descriptive_function_1
from files_to_be_imported import descriptive_function_2, descriptive_function_3, descriptive_function_4
GLOBAL_VARIABLE = "global variable"
class SomeException(Exception):
"""this is an exception and needs a docstring"""
pass
def single_variable(var: str):
descriptive_function_1(var)
def multiple_variables(
descriptive_var_1: str,
descriptive_var_2: int,
descriptive_var_3: bool,
):
try:
descriptive_function_2(descriptive_var_1, )
descriptive_function_3(descriptive_var_2)
descriptive_function_4(descriptive_var_3)
except SomeException:
logging.exception("Some Exception occurred")
descriptive_dict_example = {
"key1": "value1",
"key2": "value2",
}
descriptive_list_example = [x for x in range(0, 10)] logging.info(f"{descriptive_dict_example} and {descriptive_list_example}")

Notice that YAPF, like Black, does not cover imports. But we have managed to dedent the brackets on our functions. The dictionary key values are all on separate lines. Overall it has stuck to PEP8 in terms of general style.

If you want control of imports too then you’ll need to combine one of the previous two formatters with iSort:

To install and use iSort:

pipenv install -d isort
pipenv run isort file_to_be_formatted.py

This will sort your imports and parenthesise and your multiple import statements.

"""file_formatted_with_isort.py"""
import logging
import os
import sys
import pandas as pdfrom file_to_be_imported import descriptive_function_1
from files_to_be_imported import (descriptive_function_2,
descriptive_function_3,
descriptive_function_4)

Then to combine this with YAPF you need to add a comma to the end of the multiple import line.

"""file_formatted_with_isort.py"""
import logging
import os
import sys
import pandas as pdfrom file_to_be_imported import descriptive_function_1
from files_to_be_imported import (descriptive_function_2,
descriptive_function_3,
descriptive_function_4,)

Your final result using both iSort and YAPF:

"""file_formatted_with_isort_and_yapf.py"""import logging
import os
import sys
import pandas as pdfrom file_to_be_imported import descriptive_function_1
from files_to_be_imported import (
descriptive_function_2,
descriptive_function_3,
descriptive_function_4,
)
GLOBAL_VARIABLE = "global variable"
class SomeException(Exception):
"""this is an exception and needs a docstring"""
pass
def single_variable(var: str):
descriptive_function_1(var)
def multiple_variables(
descriptive_var_1: str,
descriptive_var_2: int,
descriptive_var_3: bool,
):
try:
descriptive_function_2(descriptive_var_1, )
descriptive_function_3(descriptive_var_2)
descriptive_function_4(descriptive_var_3)
except SomeException:
logging.exception("Some Exception occurred")
descriptive_dict_example = {
"key1": "value1",
"key2": "value2",
}
descriptive_list_example = [x for x in range(0, 10)] logging.info(f"{descriptive_dict_example} and {descriptive_list_example}")

You can see that the imports are now to PEP8 standards. We have parenthesised and dedented the multiple import statement too! Fantastic!

I hope this gives a little insight into how these tools can be used. I really do believe Python Protocol should be standardised, so that as developers we can concentrate on the important things.

I’ll be writing a shell script to automate this whole process, which I’ll post at some point in a follow up blog.

If you have any questions feel free to ask, I’ll catch you on the next one.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store