SQL queries for financial modeling

Natalia Colomé
7 min readApr 6, 2018

--

This blog post is related to My capstone project: from Excel to Rails + React.
While you can read about the reasons why I developed this tool here, here I will detail the models, their relationships and the queries used.

Profit & Loss Statement

The profit & loss statement presents the company’s revenues obtained through sales, and the costs and expenses incurred to generate those sales to determine if the company is making or losing money during a period of time.

Depending on how much detail we need, this can be translated to:

Where discounts are any reduction on price, costs of goods sold are those directly related to providing the product or service we offer, and other expenses are those that support and are necessary to the business but are not directly related to the delivery or production of the product or service.

Inside those red blocks (expenses) we can be as detailed as we want or need to be. For example:

To name a few

Knowing this, we build our models with the following relationship (focus on Sales):
Each sale has discounts, costs of goods sold and operating expenses associated to it.
We also know that each sale has a client, region and product associated to it.

In the Sale model we include the associations and add dependent: :destroy to its relationship with discounts , cost_of_goods_sold and operating_expenses; that way, if we delete a sales record, all its associated costs will be deleted too (we won’t be tackling returns for simplicity).

In the query we are defining the columns we want: month, year, amount of sales, discounts, cogs (cost of goods sold) and expenses, from a table built by joining the sales, discount, cogs and operating_expenses tables through their sales_id. Finally, we want our information grouped by month and year, and we want the data to be sorted by year and month so we have chronological order.
The result looks something like this:

 month | year |  sales   | discount |   cogs   | op_expense
-------+------+----------+----------+----------+------------
1 | 2016 | 10548.31 | -2406.42 | -5584.69 | -1441.04
2 | 2016 | 11303.88 | -2560.45 | -5879.57 | -1757.54
3 | 2016 | 11295.84 | -2686.86 | -6366.02 | -1035.60
4 | 2016 | 11177.44 | -2677.67 | -5969.24 | -1528.03
5 | 2016 | 11947.07 | -2744.97 | -6265.05 | -1787.97
6 | 2016 | 11650.48 | -2552.37 | -6054.66 | -2192.36
7 | 2016 | 11330.94 | -3043.80 | -5982.17 | -1295.88
8 | 2016 | 11845.38 | -2746.24 | -6171.20 | -1792.38
9 | 2016 | 13588.67 | -3658.37 | -7242.13 | -1567.51
10 | 2016 | 9767.16 | -2009.96 | -3146.44 | -3733.21
11 | 2016 | 11359.25 | -2361.53 | -5217.12 | -3014.03
12 | 2016 | 10376.99 | -2674.69 | -5280.14 | -1569.31
1 | 2017 | 11369.63 | -2637.76 | -5955.84 | -1879.22
2 | 2017 | 11207.33 | -2406.19 | -5721.36 | -1941.44
3 | 2017 | 13673.06 | -2906.08 | -6707.17 | -2692.25
4 | 2017 | 13641.98 | -2938.09 | -6336.69 | -2807.85
5 | 2017 | 13749.30 | -3856.65 | -6191.09 | -2594.32
6 | 2017 | 13774.37 | -3357.88 | -6967.75 | -2361.36
7 | 2017 | 14841.94 | -3580.32 | -6875.13 | -3324.54
8 | 2017 | 14749.18 | -3719.81 | -6899.25 | -3040.54
9 | 2017 | 14590.38 | -3879.09 | -7251.94 | -2493.67
10 | 2017 | 13812.89 | -3586.09 | -6506.92 | -2597.62

Then, we use a Serializer to customize the JSON output to include Net Sales, Gross Profit and Net Profit:

We are adding because we have all expenses stored as negative values

Growth and Contribution

In my Dashboard view I wanted to have a few indicators that would help me know if my sales are growing and who are the key players of that growth.

Dashboard Cell (content = equation)

This cell is calculated for each client_cluster/product combination and client_cluster/product_category combination. The building blocks for it are revenue and discount for the current year and previous year broken down by customer and product.
In order to compare it, we need to have ‘year-to-date’ data. Instead of hard coding a specific month we can get what is the latest month in the database through the following ActiveRecord query:

What do we want? We want to get all dates associated to a Sale and get the latest month
Breaking it down:

  • order(transaction_date: :desc) sort all records based on transaction date in a descendent order
  • limit(1) with all the records sorted we limit the return to only the first record
  • pluck(:transaction_date)[0] we select only the transaction date from that record. Because pluck returns an array, we select the first element
  • month Now that we have the date element we can call the attribute month and get its numerical value

Now that we have out last month we can get the accumulative data:

What do we want to do? We want to get accumulative sales and discounts up to a ‘x’ month (year to date) for 2 separate years detailed by client_cluster, product and product_category (so we can filter them later in the view).

This was built through a WITH clause. WITH clausesmake it easier to build and use subqueries in a more organized way. The general structure is:

General structure of a WITH clause

In our case we have 2 subqueries, one to get sales & discounts for the current year (named ytd_2017) and the other to get the same data from the previous year (named ytd_2016).

Both subqueries are essentially the same: we select sales, discount, client_cluster, product and category columns. To get all this information we join Sales, Clients, Client Clusters, Region Products, Products, Categories and Discounts and we group them by client_cluster, category and product name (column 3, 5, 4).
The difference between both subqueries is the condition. For ytd_2017 we want the year of the transaction_date = 2017; for ytd_2016 we want the year of the transaction_date = 2016 AND the month of the transaction_date between 01 (January) to the latest month we have for the current year.

Once we have this two temporary tables we join them when the client_cluster and the product are equal. The result is:

Finally, the only piece we’re missing is Total company net sales:

This query will return the global net_sale. Now that we have all the pieces, we customize the output through a Serializer:

Top Customers

Because it is important to monitor the few clients that make up most of our revenue, the final view is specific for Customers.
We are going to return both their revenue and discounts for the current and previous year. This way we will know if there has been a growth in sales, a growth in our investment in them, and how the sales-investment relationship behaves.

To build it, first we have to find out who are our currently our top customers. For this case, we will return top 10 customers.

We will build a query similar to the one we used for the dashboard, with the difference that we will sort the records based on total net_sales and we will limit the return array to 10 records:

The result looks like this:

With this information we can tell:

  1. Increased Sales & Decreased Investment: They are growing without our support. What are they doing right?
  2. Increased Sales & Increased Investment: Our strategy is working and we are supporting our client’s growth. What are we doing right?
  3. Decreased Sales & Increased Investment: We are investing but our client is not buying. What is happening?
  4. Decreased Sales & Decreased Investment: A key customer has declining sales and we are not supporting them. What is happening?

While these queries work, there’s room for improvement. Time was tight when I was making this project, so once they worked I moved on, but I’m happy to hear about how you think I could improve these queries.

If you liked this post, please leave a 👏 !

--

--