GETTING STARTED | FINANCIAL MODELLING | KNIME ANALYTICS PLATFORM
No-code Financial Modelling Pipelines in KNIME
How to create a reliable financial modelling pipeline that delivers a uniform data structure for easy consumption
I’ve recently explored KNIME’s capabilities when it comes to financial modeling. I have documented the process of building a minimal viable product in a YouTube video series which you can find following the link at the very end of this article. I have thought about how to possibly fit the content of a total of four videos (to date) into an article — the best idea was not to try that at all, but to cover the raw basics and use it as a pre-read to for readers to make up their mind if diving into this topic in-depth is for them or not.
This article guides you through 6 steps that turn what looks like a very simplistic dataset of just 4 rows into a bottom-up data set containing periodic accounting journals that make the transactions flow from Profit and Loss Statement to Balance Sheet and finally to Cash Flow Statement. The final data set will contain almost 1500 rows of transactional data and it will take about 20 nodes and not a single line of code to transform it.
Financial Modelling — Why in KNIME?
Let’s start with why I thought it to be a good idea to build what I call a Financial Modelling Pipeline in KNIME:
My background is Finance / Controlling and Accounting. During my career, which spans almost 20 years across various functions and companies, I had to built financial models and projects for many different occasions. Short-term models to produce budgets and forecasts in FP&A, Offer/bid models to turn various costing and profitability inputs into a final customer price, operational/accounting models to apply complex accounting rules like IFRS15 revenue recognition and IFRS16 leasing, to long-term projects and business plan models to quantify and assess the opportunity of new business ventures.
I’ve used various technologies to achieve this. From basic models in spreadsheet tools like Excel, to combining spreadsheets with new technologies to take care of calculations — e.g., capturing input in a spreadsheet and taking care of calculations in PowerPivot or PowerBI and storing results in databases like Access or SQL. I’ve also reverted back to KNIME.
I’ve witnessed first-hand many cases where traditional spreadsheet and formula-based models that were supposed to be “robust” actually caused issues:
- That one time when a project requiring board approval that looked way too appealing only did so because of formula inconsistencies due to user error.
- All those times when the numbers did not turned out as modelled and the question “why” came up — sorting through many different version… “vFinal_Draft3_Submitted”…
- Difficulties in finding answers to supposedly easy question because of having to pull them out of a variety of different sheets.
- Complex, long-term model turning large in size to the extent that frequent crashes occur and user experience gets impacted adversely.
- Understanding complex formulas after “getting handed over” a complex model build by someone else (Admittedly, I may have granted some former colleagues the same favor at times…).
Don’t get me wrong — I don’t make this article about bashing spreadsheets or traditional financial modelling — it is to showcase and further explore what I have found to be a good alternative. The above is a collection of my experience over almost 20 years on the negative side. I think I made those experiences not necessarily because Financial Models are bad in general, but because not every financial model is built by a highly skilled modeler that can implement the right measures to avoid the above, but rather by your average employee that somehow got known to be an Excel-Wizard.
So why KNIME again?
Ah right — I went a bit on a tangent here.
Well KNIME allows you to build very complex logic step by step. This allows you to document what your workflow is doing on the fly and, if done right, does not require understanding complex formulas that cover 4 or more lines in your formula bar. With KNIME’s extensive data wrangling capabilities, it is simple to apply business logic to a dataset with few nodes and especially when it comes to double-entry bookkeeping logic, it is easy to re-use certain parts of your dataset that model the initial transaction — e.g., once you’ve worked out at what point in time your revenue should be recognized, it is easy to model first the balance sheet side of that journal and then using the same outcome to model the balance sheet release and cash flow postings. If you read on you will see what I am talking about!
Introduction to data and model logic
I want to dedicate this article to narrowing done the very basics to get started with Financial Modelling in KNIME.
Along the way, these basics will be applied to a very simple example.
More specifically, we will generate a Financial Modelling Pipeline that models basic Profit and Loss, Balance Sheet and Cash Flow Accounting Journal entries for this simple data set below:
The above table outlines example planning data of a company that manufactures machines for industrial use and offers maintenance services. For each planned delivery the table outlines Total Revenue, Total Cost, a Start Date, a Duration over which costs / revenues will be recognized and a Type: Either Product Sales or Service Delivery. From a logic perspective a duration of 120 means that the total amount of Revenue and Costs should be equally spread across 120 months starting with the Start Date.
So what are we going to do with this data? We will put logic in place within our workflow to generate accounting journal entries that follow double-bookkeeping practice. This means that via a range of accounting journals we will make the above transaction flow from Profit and Loss to recognize revenues costs and the consequential profit, to the Balance Sheet where we record resulting assets (Invoice to be paid by the customer) and liabilities (e.g., Invoices paid to suppliers) and finally to the Cash Flow Statement where we record expected incoming payments from invoices to customers and outgoing payments from invoices to suppliers.
The image below gives you an overview of the journals we will model:
When modelling our data there is one decision to make before we get started: Which of the 2 options — debit and credit — will have a negative sign and which a positive? The reason why this is a decision to be taken is that from a pure Accounting perspective, somewhat counterintuitively, credits in general are shown with a negative sign. On the assumption that the output of this model will be consumed by decision makers rather than accountants, in this model credits will be shown as positive, debits as negative. We will also take the assumption that, in general, each supplier invoice is paid 2 months after it was received and each customer invoice is paid 1 month after it was issued.
Both the example data and the journal overview will be available with the workflow on my KNIME Community Hub — you can find the link at the bottom of this article.
Now let’s move ahead and built this workflow!
Step 1 — Reading in the data
We start simple by reading in the data. I have saved the data in .xlsx format in the “Current Workflow Data Area” — this is a folder named “data” in the Workflow folder and can read it from there with an Excel Reader:
Step 2 — Adding columns for Revenue and Costing recognition
Next we use the new Expression node to add in total four new columns to the data set. Each column will show the periodic value (Cost or Revenue divided by Duration) and we will also ensure values have the correct sign according to the logic outlined above: “pRevenues” and “pAccountsPayable” will be positive, whereas “pCostsOfGoodSold” and “pAccountsPayable” will be negative. For good measure, we round the values to two decimal places. For this, we can use one node to generate 4 expressions using the KNIME Expression language — e.g., for “pAccountsReceivable” we use this expression:
-round($["Revenue"] / $["Duration (Months)"],2)
Step 3 — “Explode” the data set
Next, we want to ensure that we have one row per period that each of the original rows will generate transaction for — that is the number of months specified in the Duration (Months) column. There are different way how to achieve that — in my videos I typically used a loop — however, here I have a very simple trick: the One Row To Many node!
Simply select the column that holds the number of rows required, in our case that is column “Duration (Months)”, and this node takes care of the rest for us! If you inspect the image below closely, you will notice that the RowID column looks interesting. For each original row it has “m_” followed by a number added to it. This will come in incredibly handy in the next step as this is a sequential numbering of the rows that has been generated for an existing row.
Step 4 — Generating the right date for each journal
Now that we have one row for each period that revenues and costs will be recognized for, we need to determine the correct date. As mentioned above, a good starting point is a sequential numbering of the newly generated rows in step 3. This can be extracted from the end of the current RowID — more specifically we need to extract anything after the “_” — e.g. from “Row1m_2” we want to extract 2.
To do this we perform a series of transformations. There are ways to do this with potentially a node or two less, but in order to stay entirely “no code” I decided to go a slightly longer, but easier to understand way:
- 4a) Using the RowID node to extract current RowIDs into a new column “periodNumberMapping”
- 4b) Using the Cell Splitter By Position node to split the newly created column “periodNumberMapping” at the 6th character to generate two new columns: “x” for the first characters that we do not need anymore and “periodNo” for the relevant part after the “_”
- 4c) Using Expression node to convert the number in column “periodNo” from string to integer and to deduct 1. This is to prepare for creating the correct date in the next node.
- 4d) Using the Date&Time Shift node to shift each rows Start Date by the number in column “periodNo”. As the sequential numbering in the RowID we extracted before starts with 1, we deducted 1 already in 4c to avoid that e.g. if a Start Date was 2025–05–01 that this is shifted to 2025–06–01 in the first row.
- 4e) Using the Column Filter to remove columns we won’t need for next steps
For brevity I did not screenshot each individual configuration dialogue. If interested, please go ahead and download the workflow linked at the bottom and inspect it node-by-node.
Step 5 — Cash Collection and Cash Outflow Journals
In case you have not realized it, for the journals listed as 1. for both revenue and cost, we are done! The date set is complete to the extent that we have the correct period values and the correct dates for each journal we need according to the model logic we wanted to build.
We are now close to the goal of having all journals modelled.
We branch out the workflow now — the upper branch removes all value columns except “pAccountsReceivable” and the bottom branch does the same except we keep “pAccountsPayable”. Both of these accounts, as we can see from the journal entries, will receive another posting, but with the initial sign inverted and in a different month.
In both branches, we start by determining the correct month for the next posting: The Start Date in the upper branch is shifted forward by 1 month, the Start Date in the bottom branch is shifted forward by 2 month. For sake of simplicity and brevity both values are hardcoded for now — there are different ways of how this can be handled in a more flexible manner to account, e.g., for different payment terms for different products, customer etc.
After having implemented the above we have now three nodes which hold all the journals we had to model: 4e for all initial journals to cover revenue and cost recognition, and both 5c nodes for the final journals to cover cash collection and cash outflow:
Step 6 — Merging data sets into one uniform data structure
In order to finalize our pipeline and to ensure that we have all data in one final table, we start by adding a Column Renamer node to each of the 3 nodes with relevant data to rename column Start Date to “date”.
After that, we need to work towards a solution that allows us to have exactly the same column names across all three tables. The tricky part to solve here is how to deal with the columns that hold the data for the different accounts — i.e., 4e has “pRevenue”, “pCostsOfGoodsSold”, “pAccountsReceivable” and “pAccountsPayable”, but does not have Bank and 5c nodes have Bank and either “pAccountsReceivable” or “pAccountsPayable”.
To solve this we dis-aggregate or flatten all three tables by using an Unpivot node. This allows us to move the above-referenced columns into individual rows. The outcome is a a table structure that is the same across all three tables after unpivoting in that they now hold a column “ColumnNames” and a column “ColumnValues” and have more rows initially.
With all tables in the same structure we can now use Concatenate to bring them all into one table.
The last and very much cosmetic step is to do some renaming again.
ColumnNames column turns into “Account” and ColumnValues turns into “Values”.
Wrapping up
The above 6 steps outline how to create a reliable financial modelling pipeline that delivers a uniform data structure which allows for easy data consumption, e.g., via common Business Intelligence Tools. Whereas the example chosen was deliberately of low complexity, even more complex scenarios can be added by following the same principles.
If this tutorial has made you want to find out more I recommend checking out my video series on YouTube, which covers building even more functionality — saving versions, building dashboards with waterfall diagrams to dynamically analyze differences between saved versions and so on and so forth.
Resources
Contact for more information: