How to build an ETL pipeline for your BI Financial Model?

Data Infrastructure design explained simply

Ehab Hasan

--

Introduction

Hi! This is Sam:

Sam is a financial modeler who workes day and night, building financial models to support investment decisions.

Come closer and let us meet Sam!

— Hi Sam, can you tell us a little bit about your work? :)

—Oh, hi! Well, my main responsibility is to utilize the best forecasting and financial analysis techniques and methods, so that I can build predictive models. For this, I use historical financial statements (Cash Flow Statement, Income Statement, and Balance Sheet) to estimate the value of an investment by forecasting its future cash flows.

This is what the financial modeling process looks like:

In a nutshell, I analyze historical data to predict the value of a company.

However, I spend most of my time extracting data, checking its quality and doing the calculations…I also have to create reports to capture my findings. All this processing takes an awful lot of time! :(

— What if I tell you that I have a solution to automate a great deal of that process?

The Solution

It’s time to get serious! I set out to write this article because I want to share my insights into designing the ETL pipeline of BI financial modeling. This article is the first in a series I am planning to publish; now I am going to explain the basic structure of my solution and go into detail in each of its components in the following posts.

The main goal of the solution is to improve the financial modeling process by reducing time in the completion of routine tasks and to minimize human errors.

The following diagram shows the data infrastructure of the ETL pipeline:

  1. The data pipeline starts by extracting historical financial data from multiple sources.
  2. Validate the data by comparing between multiple data sources. If there’s an inconsistency between the two sources a notification will be received. Otherwise, the process will continue to the next phase.
  3. The data will be loaded into the financial database and the financial ratios will be calculated.
  4. By retrieving the data using Excel tools, the manual process begins. Assumptions will be made and subsequently, forecasts will be created.
  5. After saving the Exel file in a shared folder, the automated process will continue by validating data. If there’s an inconsistency between the two sources a notification will be received. Otherwise, the process will continue to the next phase.
  6. Finally, the data will be loaded into a relational data warehouse.
  7. The last part will be connecting the data warehouse to the front-end BI application to create real-time reporting and facilitate efficient data visualization.

Tools:

SSIS, SQL Server Integration Services

SSMS, SQL Server Management Studio

SSAS, SQL Server Analysis Server

Power BI

Advantages of the solution:

  • Time-saving: The process will be automated, so the users will no longer need to spend time extracting, validating, calculating and loading data. They can now focus on building robust financial models.
  • Data quality: Validating the data from multiple sources will be automated. In addition, the data will be validated again after building the financial models to make sure that the raw financial data has not been modified by mistake.
  • Simplified process: The model will simplify the process where users will no longer need to use several programs, it’s enough with MS Excel. By developing an Excel model using VBA, the users will be able to extract the data instantly from the financial database.
  • Dashboards in real-time: By connecting the data warehouse to the BI tools, the users will be able to visualize the financial data in real-time.
  • Cost-saving: The model will only use the organization available resources, i.e. MS tools.
  • Accessibility: Saving the historical and processed financial data in one data storage will increase accessibility.
  • Accountability: By creating a reference for every process.
  • Flexibility: The model can be developed at any time, adding or removing any process won’t affect the subsequent processes.

Disadvantages of the solution:

  • A lot of testing and troubleshooting.
  • Knowledge in SQL, DBMS tools, VBA, Network Server Setup, Microsoft Analysis & Integration Services is required.

What do you think, Sam?

--

--