Build your Financial Model

Jeroen Bouma
6 min readMar 5, 2024

--

A financial model can have multiple purposes. It can be as simple as the aggregation of data to a more complex model which includes forecasting and scenario analysis. In any case, the model should be built in such a way that it is easy to understand, maintain and extend. This is where the concept of modular programming comes in which is the approach as found in Structure your Model. For more examples of this method and inspiration how to build your own model please have a look at the Finance Toolkit, Finance Database, OpenBB Terminal, yfinance and Riskfolio-Lib.

Whatever the purpose of your model is, the following styling and coding guidelines should be applied to take out the subjective nature of coding. Applying a style guide will ensure that all code is written in the same way and therefore is easier to read and maintain.

The linters, as discussed in Setting up your Project, will do much of the initial styling of the code for you. However, the methods you use to code, how you choose to name your variables or what docstring structure you employ is not something linters will be able to help you with. This is where the Style Guide PEP8 comes in.

This guide is part of a series related to building financial models:

Default Styling

Throughout this page, PEP is frequently referenced. PEP stands for Python Enhancement Proposal. A PEP is a technical design document for the Python community which describes a new feature for the language itself, its processes, or its environment. This is something developers have agreed on and therefore there is no point in reinventing the wheel for this matter.

The styles as described by PEP8 (see here) are used for general code structures. This is the default style that is adopted by countless of developers. By applying this style, the style is the same for both internal and external tooling. It is recommended to browse through the PEP8 documentation to get a better understanding of each component. Within this section, the major components are summarised.

To summarise the code lay-out, this results in the following:

  • Indentation: use 4 spaces as indentation level. This is the default in basically any code editor.
  • Maximum line length: 79 is recommended but there is room to sway from this. PEP8 suggests up to 99 characters but 122 characters is often applied as well. The 79 characters was chosen due to the resolution size of the screens. This has since then greatly improved which loosens up this suggestion.
  • Line Breaks: should follow the logic of mathematics in which the operator is in front of the variable and not behind it.
  • Blank Lines: space out functionality accordingly, top-level functions with two blank lines and methods inside a class are surrounded by one blank line. Use blank lines within functions to separate logical sections.
  • Source File Encoding: code should always use UTF-8. Next to that, code should be written in English except for specific abbreviations.
  • Import statements: are always written down separately and when a module is imported, this is specified explicitly. Thus it should not be from package import * but instead from package import module, module2.
  • Model Level Dunder Names: any dunders (e.g. __version__) should be placed before the import statements (with the exception to the __future__ import).

Naming Conventions

Naming conventions for each type of variable is as follows:

  • Classes: usesCapWords like Ratios or RatiosClass.
  • Functions: uses lowercase with a verb like get_gross_margin.
  • Variables: uses lowercase like margin or gross_margin.
  • Constants: uses uppercase like PERIOD. These variables can never change.
  • Internal Variables: uses an underscore at the start like _income_statement, this is meant for class-based systems to differentiate variables accordingly. Generally you won't use these variables outside of the class.

The goal with the naming convention is to make variables recognizable from the way they are written down. This makes it possible to understand the type of variable without needing to look for the variable declaration. By definition, I will know that get_gross_margin will execute a function whereas gross_margin and PERIOD will return data.

It is also important to make variables as descriptive as possible. For example, a variable should never be called df as it has little meaning. It is better to have a variable called microsoft_trailing_gross_margin than use msft_ttm_gm because the time spend reading code is 10 times higher than the amount of time code is written.

Applying Typing

All variables should contain typing on initialization. This means when you create a new variable, it should display exactly what type it could be. This is all defined in PEP 256 (see here) and PEP 484 (see here). For example:

This also applies to functions:

Not defining typing lowers the quality of the code as the user will need to read the docstring or code first before it is possible to understand what to supply.

Writing Docstrings

Docstrings should follow PEP 257 (see here). This format is widely accepted by developers and used within many code editors as the default as well. An example of how a docstring could look like is as follows:

This is the Google format but there are other formats such as reStructuredText as well. Which one you choose doesn’t matter as long as the docstrings you write explain what the function does, what arguments it takes and what it returns. It needs to provide enough clarity that the user can understand the purpose of the function without having to read the code.

I recommended to be as extensive as possible and thus it is better to overdo it then to have a minimal docstring which still doesn’t really explain what is going on. Docstrings give you the room as well to explain the financial theory or the logic of the function. See an example here of what it could look like (from here) below. This is on purpose very extensive to show what is possible.

Creating Documentation

Besides styling and docstrings, documentation is actually pretty important if you want to share your code with others. This is where Sphinx comes in. Sphinx is a tool that makes it easy to create intelligent and beautiful documentation for Python projects (or other documents consisting of multiple reStructuredText or Markdown files).

This is not the only approach to create documentation. There are other tools such as MkDocs and Read the Docs which are also widely used. See below the documentation of the Finance Toolkit as an example. This uses custom JavaScript but the result is similar to that what you can achieve with Sphinx.

Documentation shouldn’t just be about describing each individual function. It should also feature Jupyter Notebooks demonstrating use-cases. This helps in understanding the logic behind the model and for what it could be used. This should be saved in an “examples” folder as also shown in Structure your Model. E.g. see below a snippet of the Options Notebook from the Finance Toolkit.

When working in a corporate setting, do not forget about the function of the Wiki (e.g. from Azure DevOps). This is a great way to share information with your team and to document your code on a higher level while still being able to use Markdown and version control.

If you have designed proper docstrings, the documentation can be used to explain the overall structure of the model and how to use it instead of explaining each individual function. Especially when the model serves as the back-end, it can actually help Financial Analysts, Portfolio Managers and similar to understand what the model does without needing to understand any programming language. This is a great way to bridge the gap between the technical and non-technical professionals as well.

--

--

Jeroen Bouma

With Experience and Education in the area of Quantitative Finance, my ambition is to continuously improve in the area of Quant Finance and Python Programming.