Automatic QA Code: Pre-commit

Marc Domènech i Vila
10 min readApr 29, 2024
Cameras in a building

Of course, we want quality code, who wouldn’t? But to improve code quality, we have to define what it is. In this article, we are going to learn what is QA and how we can automatically apply it in our Python project using Pre-commit.

Code quality generally refers to how functional and maintainable your code is.

In other words, we refer to a high-quality code if:

  • It does what it is supposed to do.
  • It does not contain defects or problems.
  • It is easy to read, maintain, and extend.

1. Why Ensuring Code Quality is Important 🚨

Often when software developing we don’t care about good practices, code readability, etc. Simply, we assume that our code is good (spoiler: not really). This assumption is not a problem if we are working alone and we are sure that anybody else will see our code. But this is not actually true. This problem gets more critical when we are talking about deploying our code into production. So, ensuring code quality can help us to:

  1. Reduce bugs and errors: Catching errors early saves time and effort in the debugging and maintenance phases.
  2. Improve code readability: Well-written code that adheres to style guidelines is easier to read, understand, and maintain.
  3. Facilitate collaboration: Consistent code style and practices are critical when multiple developers work on the same project.
  4. Speed up project timelines: Fewer bugs and clearer code can significantly speed up development and reduce time-to-market.

Convinced that we have to ensure a certain level of quality in our code, the question is: Which aspects of our code we are going to focus on?

  • Linters: Linters are tools that analyze source code to find programming issues, syntax errors, and other potential problems. Popular examples include pylint, flake8, and pyflakes.
  • Code Formatters: Code formatters automate the process of formatting source code according to predefined style conventions. Examples include black, autopep8, and yapf.
  • Automated Testing: Automated tests, such as unit and integration tests, are crucial for ensuring the functionality and quality of the code. Frameworks like unittest, pytest, and nose allow for writing and running tests in an automated way.
  • Static Code Analysis: Static analysis tools like bandit (for detecting security issues), mypy (for static type checking), and prospector (which integrates several static analyzers) can identify potential problems in the code without needing to run it. These tools can help find security vulnerabilities, logical errors, and possible performance improvements

Ok, so… how can we apply these tests/analyses in our project? Don’t worry, here is where Pre-commit comes in.

2. Pre-commit ️⚠️

pre-commit is a framework for managing and maintaining multi-language pre-commit hooks. We call “hook” to a script or a set of commands that are executed automatically. But, when are we going to execute these tests?

This tool allows us to configure a set of quality tests to execute automatically before running a git command (commit, push, etc.). This is so powerful in the sense that each time we modify our git repository, we can ensure that the code added follows the quality rules we have set.

2.1 Pre-commit: Installation ✅

To install pre-commit, first, execute the following command line:

pip install pre-commit

2.2 Pre-commit: Repo structure 🏛️

In order to configure pre-commit, we need 2 files that have to be located in the root folder. These files are .pre-commit-config.yaml and pyproject.toml.

project/
├── src/
├── .pre-commit-config.yaml
└── pyproject.toml
  • .pre-commit-config.yaml: Mandatory. Configures the QA pipeline. It contains each of the hooks to be executed before executing a git command.
  • pyproject.toml: Optional. Configuration file used by packaging tools, as well as other tools such as linters, type checkers, etc. It allows more specific hook configurations.

2.3 Pre-commit: QA Pipeline 🧪

Now, let’s see how we can modify these files. First of all, we are going to see a simple example of .pre-commit-config.yaml configuration:

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.2
hooks:
- id: ruff
args: [ --fix ]
- id: ruff-format

In this case, we are defining 5 steps (one for each ID), where the first 3 will check yaml syntax, end of files and trailing whitespaces. The last 2 hooks, come from Ruff (a new powerful linter). Let’s understand the structure of the file.

  • repo: URL where the repository is located. Otherwise can be local if we are going to execute a custom hook.
  • rev: The tag of the version we want to use.
  • hooks: Inside we specify all the hooks we want to use of this repo.
  • id: ID of the specific hook.

For each hook, we can specify more parameters like types, stages, etc. You can see the list in the official documentation.

In the other hand, there are tools that are more configurable, so, in this case, we are going to be more specific in the pyproject.toml file:

# Other Pyproject stuff
# ...

# Ruff hook configuration
[tool.ruff]
line-length = 88

In this example, we have configured Ruff to force our line code length not to be greater than 88 characters. To know which configurations you can set up in this file, you will have to go to the official package website and do research.

2.4 Pre-commit: Execution 🖥️

Before execution them, we need to install the hooks used. Pre-commit by default installs all the dependencies derived from hooks in a isolated virtual environments, so we don’t have to worry about having conflicts with our packages. That’s cool! However, if want to force pre-committo use the libraries installed in the same environment, we will need to force all the hooks to be custom (explained in a next section).

So, independently whether the hooks uses isolated venv for dependencies, we need to install them. To do so, we use the following command line:

pre-commit install

After this, we can install additional hook-types if needed. In this case, I found it interesting to install commit-msg and pre-push. This able us to configure automatic executions before committing a message or doing a push.

pre-commit install --hook-type commit-msg --hook-type pre-push

From now on, pre-commit will listen all git executions (commit, push, etc.) and will execute all the hooks defined its stages parameter. By default, all are set to be executed at each git commit.

Otherwise, we can run it manually using the following command lines:

# Run on a set of files
pre-commit run --files file1 file2 ...

# Run on all files
pre-commit run --all-files

2.5 Custom Hooks 🪡

If you want to define custom QA rules, you can write a script in the programming language of your choice. In this case, a Python script is provided as an example.

The developed hook allows for checking that files with a specific extension are not being committed. This tool can be very helpful in detecting if sensitive data such as CSVs, Excels, etc., are being uploaded.

Next, the implementation of the custom hook (check_file_extensions.py).

import sys
from pathlib import Path
from typing import List
import configargparse


def check_file_extensions(
file_extensions: List[Path],
files: List[Path],
) -> List[Path]:
"""
Look for files that have unwanted formats.
Parameters
----------
file_extensions : List[Path]
Extensions that we want to check.
files : List[Path]
Ficheros to check.
Returns
-------
List[Path]
Returns a list of files with an incorrect format.
"""
invalid_files = []
for file_path in files:
if (
file_path.is_file()
and file_path.suffix.lower() in file_extensions
and not str(file_path).startswith(".")
):
invalid_files.append(file_path)
return invalid_files

if __name__ == "__main__":
# Parse arguments
parser = configargparse.ArgumentParser(description="File extension checker")
parser.add_argument(
"--formats",
nargs="+",
required=True,
help="List of unwanted file extensions",
type=str.lower,
)
parser.add_argument(
"--files",
nargs="+",
required=True,
help="List of files",
type=Path,
)
args = parser.parse_args()
# Check for invalid format files
invalid_files = check_file_extensions(
args.formats,
args.files,
)
if invalid_files:
print("Ficheros con un formato distinto a:")
for file_path in invalid_files:
print(file_path)
sys.exit(1)

Once the hook is implemented, it needs to be added to the configuration file .pre-commit-config.yaml in the following way:

repos:
# Custom hook
- repo: local
hooks:
- id: my-custom-hook
name: extension-file-checker
entry: python qa_code/custom_hooks/check_file_extensions.py # Ruta al script
types: [file]
language: system
args: ["--formats", ".json", "--files"]
require_serial: true
pass_filenames: true
stages: [commit, manual]

This configuration specifies the following:

  • repo: As it is local, a custom hook is defined.
  • id: ID of the new hook to display.
  • name: Name of the new hook to display.
  • entry: Command to execute the hook.
  • language: The language to execute the hook with. Setting it to system will force to use the libraries installed in the current environment. Otherwise, pre-commit will create a virtual environment to install the necessary dependencies.
  • args: Arguments that the script receives.
  • require_serial: Forces the execution to not be done in parallel.
  • additional_dependencies: Dependencies needed to launch the hook.
  • pass_filenames: Defines whether to pass all files to be analyzed as an argument.
  • stages: At what stage we want the hook to be executed In this case, every time we want to make a git commit and also manually with the pre-commit command.

3. My Pre-commit configuration 👍

Here in this section, I will show you which are the hooks I found more interesting.

  • check-added-large-files: From Pre commit hooks. Check that large files are not uploaded.
  • check-yaml: From Pre commit hooks. Check that the .yaml files are written correctly.
  • check-toml: From Pre commit hooks. Check that the .toml files are written correctly.
  • end-of-file-fixer: From Pre commit hooks. Check that the files end with a new line.
  • trailing-whitespace: From Pre commit hooks. Remove trailing spaces.
  • ruff: From Ruff. Run code quality checks. Apply checks for PEP8 Convention, syntactic errors, code complexity, etc.
  • ruff-format: From Ruff. Automatically formats code according to a specific style. Some examples are: Code alignment, blanks, line length, etc.
  • numpydoc: From Numpydoc. Validates that the docstrings are in numpy format.
  • mypy: From Mypy. Analyzes the code for errors related to the data type and provides information about possible typing problems before the code is executed.
  • vulture: From Vulture. Helps identify code that does not run and is probably safe to remove.
  • commitizen: From Commitizen. Used to ensure that confirmation messages follow a specific format.
  • commitizen-branch: From Commitizen. Used to ensure that branch names follow a specific format.
  • nbstripout: From Nbstripout. Allows you to remove run output from Jupyter notebook cells before committing changes.
  • pytest-check: From custom hook. Run the set of unit tests using pytest.
  • extension-file-checker: From custom hook. Allows you to check that files with a specific extension are not added to the commit.

In my case, usually when I use this kind of QA checks in a production software, having all the dependencies in the same environment is a requirement. So, all the hooks are configured as custom.

This is my .pre-commit-config.yaml file:

# Fichero configuracion pre-commit para python>=3.8
repos:
- repo: local
hooks:
# Pre commit hooks https://github.com/pre-commit/pre-commit-hooks
- id: check-added-large-files
name: check for added large files
description: prevents giant files from being committed.
entry: check-added-large-files
language: system
args: ['--maxkb=123']
stages: [commit, manual]
- id: check-yaml
name: check yaml
description: checks yaml files for parseable syntax.
entry: check-yaml
language: system
types: [yaml]
stages: [commit, manual]
- id: check-toml
name: check toml
description: checks toml files for parseable syntax.
entry: check-toml
language: system
types: [toml]
stages: [commit, manual]
- id: end-of-file-fixer
name: fix end of files
description: ensures that a file is either empty, or ends with one newline.
entry: end-of-file-fixer
language: system
types: [python]
exclude: ^data/mlruns/
stages: [commit, manual]
- id: trailing-whitespace
name: trim trailing whitespace
description: trims trailing whitespace.
entry: trailing-whitespace-fixer
language: system
types: [text]
exclude: ^data/mlruns/
stages: [commit, manual]
# Ruff hooks https://github.com/astral-sh/ruff-pre-commit
- id: ruff # Linter
name: ruff
description: "Run 'ruff' for extremely fast Python linting"
entry: ruff check --force-exclude
language: system
require_serial: true
types_or: [ python, pyi ]
args: [ --fix]
stages: [commit, manual]
- id: ruff-format # Formatter
name: ruff-format
description: "Run 'ruff format' for extremely fast Python formatting"
entry: ruff format --force-exclude
language: system
exclude:
'^(docs/|notebooks/demo_custom_argparse/)'
types_or: [ python, pyi, jupyter ]
require_serial: true
stages: [commit, manual]
# Numpy docstrings https://github.com/numpy/numpydoc
- id: numpydoc-validation
name: numpydoc-validation
description: This hook validates that docstrings in committed files adhere to numpydoc standards.
entry: python -m numpydoc.hooks.validate_docstrings
require_serial: true
language: system
exclude:
'^(docs/|notebooks/demo_custom_argparse/)'
types: [python]
stages: [commit, manual]
# Static type checker https://github.com/pre-commit/mirrors-mypy
- id: mypy
name: mypy
description: 'Mypy'
entry: mypy
language: system
exclude:
'^(templates/|docs/)'
'types_or': [python, pyi]
args:
[
--ignore-missing-imports,
--install-types,
--non-interactive,
--explicit-package-bases,
]
require_serial: true
stages: [commit, manual]
# Dead code https://github.com/jendrikseipp/vulture
- id: vulture
name: vulture
description: Find unused Python code.
entry: vulture
args: [".", --min-confidence, "100"]
language: system
pass_filenames: false
require_serial: true
stages: [commit, manual]
# Conventional commits https://github.com/commitizen-tools/commitizen
- id: commitizen
name: commitizen check
entry: cz check
args: [--allow-abort, --commit-msg-file]
stages: [commit-msg]
language: system
- id: commitizen-branch # Conventional branch naming
name: commitizen check branch
description: >
Check all commit messages that are already on the current branch but not the
default branch on the origin repository. Useful for checking messages after
the fact (e.g., pre-push or in CI) without an expensive check of the entire
repository history.
entry: cz check
language: system
args: [--rev-range, origin/HEAD..HEAD]
always_run: true
pass_filenames: false
stages: [push]
# Strip notebook outputs https://github.com/kynan/nbstripout
- id: nbstripout
types: [jupyter]
name: strip notebooks outputs
entry: nbstripout
language: system
pass_filenames: true
stages: [commit, manual]
# Unit tests with pytest
- id: pytest-check
#stages: [push]
types: [python]
name: pytest-check
entry: python -m pytest
language: system
pass_filenames: false
always_run: true
stages: [commit, manual]
# Custom hook: Extension file checker
- id: extension-file-checker
name: extension-file-checker
entry: python qa_code/custom_hooks/check_file_extensions.py # Ruta al script del custom hook
types: [file]
language: system
exclude: '^(notebooks/demo_custom_argparse/)'
args: ["--formats", ".json", "--files"]
require_serial: true
pass_filenames: true
stages: [commit, manual]

I hope this article helped you to understand the importance of developing high-quality Python code and how to put it into practice using Pre-commit.

Contact me 📬

If you liked the article, let me know with a big clap 👏 !

Also you can follow me on Medium or contact me via Linkedin.

See you in the next post! 🤘

--

--