Setting up your Financial Model
Once you got the hang of Python (if not yet, have a look here) and have installed a code editor such as Visual Studio Code, it is time to start working on your own project and utilise the various tools that greatly improve the quality of a model.
By applying the structure as I write here right from the start, you ensure that anything you build is maintainable and scalable. This is especially important when you want to share your model with others or when you want to use it for a longer period of time.
This guide is part of a series related to building financial models:
- Setting up your Financial Model
- Structure your Financial Model
- Build your Financial Model
- Test your Financial Model
Directory Structure
The folder structure could look like the following, emphasizing here on the financetoolkit
folder that contains the actual financial models:
The tests folder structure has the same structure except it contains unit tests for pytest
to work with:
I tend to include the following files that help manage the project. Within any project of mine these will always be there, in fact I tend to copy them over from the Finance Toolkit repository as it is a great template to start with. This is the case for both personal and professional projects.
This consists of the following files:
- README.md which includes a (short) description of the repository with useful links. This is a file such as the one you are reading right now.
- .gitignore that includes Python specific and module specific exclusions. E.g. think of the exclusion of .idea, .vscode, .venv, .pytest_cache, .DS_Store, etc. See an example here of how such a file could look like.
- pyproject.toml that includes build setups, linter configurations and dependencies (example). This file is the core of the project and contains all necessary information to run the project. It is the successor of the
setup.py
andsetup.cfg
files. Read more about it in PEP518. - .pre-commit-config.yaml: a special file meant for using
pre-commit
. This file is used to configure which linters should be run prior to committing code. It keeps the quality of the code to the highest standard and is therefore definitely a must-have. It can be used by executingpre-commit install
in the terminal. See an example file here.
The objective of these files is to lengthen the life span of the model. As an example, if you are still using append
in Pandas you haven't been on top of your package dependencies given that this functionality has been depreciated since January 2022 in v1.4.0. Not being on top of these developments will mean that unless you change the code, you will be stuck on v1.4.0 and subsequently, Python 3.8 and 3.9.
Dependency Management
The chances of your model surviving the next few years goes up significantly if you keep your dependencies up to date. With this, you define what version of Python, Pandas, NumPy, Scipy and more you require at the minimum for the model to function. It would look something like this in the pyproject.toml
file:
This uses the dependency manager Poetry, an advanced tool that understands the relationships between each package and their dependencies. It is also able to create a virtual environment for you and install all the dependencies in there. This is a great way to keep your dependencies in check and to make sure that you are not using any deprecated functions. Poetry has extensive documentation and it is therefore recommended to have a look at it here.
To give you an idea what it would look like:
And in case I wish to add a new dependency (in this case including extras):
Which will then show up in the pyproject.toml
file:
Setting up Linters
Linters are scripts that check and improve your code. This can be anything from formatters to spell checkers to code quality checkers. The following are recommended to always apply and ensure that no code is committed through Git that do not meet these requirements:
- Black which is a PEP 8 compliant opinionated formatter, maintained by the developers of Python themselves.
- Ruff is a linter that is extremely fast and replaces Flake8 (plus dozens of plugins), isort, pydocstyle, yesqa, eradicate, pyupgrade, and autoflake.
- Pylint which checks for errors, enforces a coding standard, looks for code smells, and can make suggestions about how the code could be refactored.
- mypy which is a type checker that helps ensure that you’re using variables and functions in your code correctly in accordance with PEP 484.
- bandit which is designed to find common security issues in Python code.
- codespell which signals common misspellings in text files.
These linters can be configured inside the pyproject.toml
as seen below:
The .pre-commit-config.yaml
file is meant to configure which linters should be ran prior to committing code. By running pre-commit install
, it will install the linters based on this file.
I recommend downloading this .pre-commit-config.yaml file and use it as a template for your own projects. You can definitely deviate from it as you like but this is a great initial template to get you started and it uses the most common linters.
With this file installed, it will automatically be ran on each commit. E.g. when running a commit, it could look like the following:
Based on the files I have added to my commit, I am able to see which linters have passed and which have failed. In this case, Black and Ruff have failed. This is because Black has reformatted the code and Ruff has found a magic value in the code it is asking me to change. There are multiple things I can do with this:
- I can add a new variable that saves this magic value as the issue is that I use an integer that has little meaning to the reader. This is the preferred way as it makes the code more readable.
- I can add
# noqa
to the line that is causing the issue. This is mostly relevant if the code remains self explanatory. E.g. let's say you want to see if the dataset features a single or multiple columns, you could useif len(df.columns) == 1: # noqa
as it is clear what the code does. - I can add an exception to the
pyproject.toml
and.pre-commit-config.yaml
files to ignore this specific error. This is not the preferred way as it does not solve the issue but rather hides it. However, in some cases it could be that it is not a big deal that it does.
Once everything has been resolved, it is possible to commit the code. The beauty of this is that code will always be tested before it is committed.
Creating a Git Workflow
While working on code, it is important to follow a solid Git workflow that separates development from production. I make the distinction between working alone or working in a team here because you shouldn’t over complicate things. Needing to approve your own PRs from the Feature branch is non-sense if you are the only one working on the project.
My usual approach contains at least the following branches:
- Main: this is your production branch and should only contain versions of the code that are fully production-ready. You should not push directly to main but rather merge from a development branch.
- Develop: this is your development branch and should contain the latest version of the code that is not yet production-ready. You should only push directly to this branch if you are the sole programmer. Otherwise, merge from a feature branch.
This results in the following structure:
If you are working in a team, it is important to have a proper workflow with code reviewers and testers before anything is merged into the development branch to prevent issues from occurring in a feature branch from someone else that is not related to the code changes. The following branches should be included:
- Feature: these are your feature branches and should contain the code that you are working on. It will push to the development branch through PRs. These contain many different feature branches based on the feature that is being worked on and is dropped once the feature is merged into the development branch. For simplicity sake, in the graph below I merged the feature branches into one but in reality, there are many more.
- Hotfix: this is your hotfix branch and should contain the code that is required to fix a bug in production. This is only used in the rare case that a bug is found in production and needs to be fixed immediately.
This results in the following structure:
This is my way of doing things but there are many other ways to do this. The most important thing is that you have a workflow in place that works for you and your team and makes a proper distinction between production and development.