A Fully Automated Approach to Cash Flow Forecasting

13 min readSep 3, 2019

At chata.ai, we’ve leveraged many different AI technologies to create a platform that facilitates intuitive interactions with databases. We’ve created a tool that makes it easy to find answers in any volume of data, just by asking questions. Our app uses proprietary natural language understanding technology to translate everyday human language into database query language. As a start-up with a backbone of entrepreneurial spirit, we know that one of the important use cases for our app is finding insights in business and financial data when there is not a lot of time to do so. Machine learning tech is at the core of what we do, and we realized that our capabilities in that arena had something else to offer one of our current customer groups, the cloud accounting community, who need to create great cash flow forecasts but don’t have a lot of time.

We all wish we could predict the future, especially when money is involved. Cash flow forecasting (CFF) can’t tell you if your business venture will be successful, but it can be used as a powerful tool for making great decisions about the direction of your business and taking action toward the goals you want to achieve.

The forecasting system we’ve built was developed by one of our very talented PhDs and a group of amazing data scientists. We internally coined it as “Cognitive Cash Flow Forecasting”: an AI-driven system that learns and adapts from the data it ingests on an ongoing basis. Our goal is to provide users with a totally automated and objective forecast that’s based on data and not on manual interventions. We made some pretty remarkable discoveries about forecasting cash flow while developing this technology, and we want to share what we’ve learned and how we’ve approached CFF a little differently.

The Value of Cash Flow Forecasting

Typically, financial professionals provide cash flow forecasts under circumstances like: “My client needs to know if they are able to make payroll next month,” or “The owner wants to know if they can afford to take on this project based on the working capital requirements,” or “Based on the current growth and yearly cyclicality, can my client increase their dividend payouts?” It’s well known that poor cash flow (and decisions made using a poor CFF) can kill small businesses. What’s critical is that a forecast offers possibilities, typically a best-case, worst-case, and probable scenario set. When you can see these possibilities, you have a foundation for taking the next step forward. It may not matter what the exact cash balance is in three months, what matters is that you can more confidently decide what your next step is going to be. Ultimately, the goal of the forecast is to enable clients to make better decisions about their business.

A cash flow forecast is a crucial part of decision making in the same way checking the weather report in the morning is part of preparing for the day ahead. The main reason you check the weather is to make choices about what clothes to wear or to gauge whether or not you need to pack an umbrella in your bag. Meteorologists may not be exact in their predictions of the weather, so you can never be absolutely certain that you’ll need an extra sweater, but it helps to know that you might. Cash flow forecasts do need to be more reliable than the weatherperson, but the reason you’d check with either forecast is very similar: you want to know what action to take in the present to ensure the highest probability of comfort and preparedness in the future.

There’s a lot of buzz about cash flow forecasting in the financial professional space right now because it’s such a critical part of keeping a business thriving. Businesses want to know how to succeed financially and if financial professionals can provide an effective, actionable forecast it’s inarguably of huge value to new businesses and larger companies alike. Often, though, financial professionals focus on the exact forecasted numbers (for example, a positive cash flow of $20,000), rather than what the actual numbers indicate (that their client can indeed afford that new piece of equipment to expand their services). It’s critical to make decisions and take smart risks based on a combination of great data and financial advice, but, as these systems stand, we see some limitations that can seriously affect how businesses utilize forecasts successfully.

Current Limitations in Cash Flow Forecasting

One limitation we see in current CFF systems is the level of impact that manual adjustments can have on the realistic or successful usability of the forecast. There’s a reason that you’re not allowed to touch the paintings in a museum, even if you’re an art expert. It’s because touching aging paint affects the integrity of the color on the canvas and diminishes the value of the work. We see current CFF systems in much the same way. The data is a masterpiece of truth but human intervention blurs the valuable insights that the data has to offer. Historical data is the story, or true narrative, of a business. The best way to create a possible picture of what will happen in the future is to assess what has happened in the past.

Introducing manual adjustments into the forecast ultimately takes away from the actual narrative that the historical data tells. Whether the user arbitrarily adjusts sales to be 10% higher, or optimistically adjusts an expense account to be 15% lower, the act of introducing this subjective human bias into an otherwise objective process is problematic and time consuming. Inflicting subjectivity ultimately takes away from the power of forecasting because a decision made on hopeful, or even irresponsible thinking, is more likely to yield poor outcomes.

Figure 1 — Historical Weekly Sales Narratives for 3 Businesses from chata.ai’s Sample Data

Another limitation we’ve found is that the forecasting assumptions current methods use are highly generic. Figure 1 shows graphs that reflect the data narratives of three different businesses. As you’ll note, the financial narratives of these businesses are constantly changing over time in varying, and sometimes surprising, ways. In order for a system to consistently and accurately reflect the future of a business, it needs to learn the patterns of the historical data of each specific business as time goes on. Such a system also needs to take into account fluctuations, no matter how minimal or volatile, and learn that new patterning as well. Accurate rolling forecasting systems should be able to ingest data as it changes over time and adapt the picture it’s creating of the future accordingly.

Our Motivation and Contribution to the Cash Flow Forecasting Space

We are a data science-driven company that promotes a blend of advanced technology and creative problem-solving. We saw a need for CFF to be accessible enough that anyone can be empowered to more confidently make data-driven choices about their business. We also knew we could leverage sophisticated AI technology to make forecasting feel effortless for users, yet still provide a foundation for reliable output.

We’ve implemented the technology we’ve built as a feature within our platform and we call it Cognitive Cash Flow Forecasting. It’s built to address the limitations we see in certain forecasting methods and to add some ease to the whole CFF process, without compromising utility. Our goal is to provide users with a totally automated and objective forecast that’s based on the narrative in the historical data, not manual adjustments or generic assumptions.

A markedly unique feature of Cognitive Cash Flow Forecasting is that it is designed as a learning system, which means it’s constantly learning from incoming data and adjusting the forecast it produces based on ongoing understanding. The system runs continually and automatically, identifying and factoring in changes in the data as the business fluctuates day-to-day. It is designed to improve itself iteratively as it acquires new information, so as time goes on, users will see their forecasts become increasingly more accurate. As the system takes in and learns from new information, it returns a rolling forecast depicting three scenario cases: the high end of the estimated prediction, the most probable future prediction, and a low estimated prediction. This automates the work that goes into creating a forecast for decision making purposes. As the system continues to learn from the always-growing number of data points generated by a business, it is able to more accurately predict potential cash futures.

Figure 2 — Forecasts generated using different data inputs: (a) Forecasting from Financial Statement Data (three-way forecasting) — Easy difficulty case (b) Forecasting from Transactional Data — Moderate difficulty case

If you take a look at Figure 2, you’ll see the results from our test that compared (a) forecasting done with data from financial statements (typical three-way forecasting method) and (b) a forecast which uses transactional data. The first method (a) notably “smooths” the likely future, creating a non-representative forecast with huge variance ranges. It’s hard to see a clear view of the future and take appropriate action because this kind of forecast offers such a wide margin of what could possibly happen. The second method (b) is considerably more realistic based on what the business has and is experiencing, as told by the data. Using the second method, the forecast captures both volatility and cyclicality, despite it being a more difficult dataset to forecast from. With the rich information taken into account in the transactional data method, the user is given the means to make more informed data-driven decisions.

How Our Cognitive Cash Flow System Works — A Small Peek Behind the Curtain

Let’s take a look at an accounts receivable (AR)-based cash flow forecast, which is just one machine learning model among many others working together behind the scenes in our system. Typically, people will forecast cash components from AR using the balance sheet as their primary point of reference. They calculate AR days and apply this number to generate a future AR balance based on forecasted sales. Then, they look for changes in AR to translate into future cash inflow. Some other might take payment terms (upon receipt, NET 30, NET 60 etc.) into consideration and apply them to outstanding invoices in an effort to estimate when cash is likely to come in. These approaches are high-level and depend on several proxies and very generic assumptions about the inputs pertaining to cash flow. Creating forecasts using this approach is also very time-intensive.

When we began building our AR-based cash flow model, we started with a hypothesis about what we would expect to see in the AR data narrative (shown in Figure 3). As we moved forward with the project, our data science team was able to determine that there are five core data points (or dimensions) that impact predictions about when cash is likely to be received.

The core data points are:

1) Customer — each specific customer

2) Invoice Amount — the total amount of the invoice

3) Days to Pay — how long it takes to pay a certain invoice

4) Cyclicality — the time of the year the invoice is paid

5) Temporality — the entire history of invoice payments

As we began to develop a greater understanding of the role each dimension plays in determining when cash would be collected in the AR area, our expectations about the data changed radically.

We initially believed that only the first three dimensions were truly important to consider. This resulted in the second graph in Figure 3 that shows days to pay increasing relative to invoice amount at a steady rate, each and every time. However, the complete data story is much more accurately illustrated by the graphs in Figure 4. The addition of cyclicality as a core data point shattered the assumptions reflected in Figure 3. We quickly learned that taking cyclicality into consideration plays an integral role in accurately predicting cash flow.

We saw that, for certain customers, the time of year that an invoice was issued significantly affected their total days to pay. Businesses offering seasonal services such as landscaping or snow removal would often fall into this pattern as they naturally and fairly reliably have high-income and low-income periods, depending on the season. Based on these findings, we built our system to take into account that an invoice issued to a greenhouse, for example, may take an extra few weeks to be paid if that invoice were to be issued in a low-income winter month like January.

Figure 3 — AR Narratives Considering Dimensions 1–3 — Each blue dot represents one customer’s payment.

Figure 4 — AR Narratives Considering Dimensions 1–3 & Dimension 4 (Cyclicality) — Each blue dot represents one customer’s payment. The grouping of the dots represents the time of year the payment was made. Payments made in August happened sooner than payments made in January, regardless of invoice amount.

We also noticed — which in retrospect was very intuitive — that more recent behaviour from a customer tended to be more indicative of future behaviour. If the system only accounted for a customer’s average days to pay throughout their entire lifecycle, it wouldn’t reflect the truth that, for example, the customer had actually started paying their invoices much sooner (or later) than they had in the past for one reason or another. An effective system needs to recognize and learn these changes in customer behavior so that the forecast is more representative of current realities and subsequently more useful for future decision-making. It’s therefore important to understand whether temporality is a predictive dimension or not when building out forecasting models, which again, should not be generalized as an assumption in every case.

Built to Account for Differences Across Businesses

We’ve designed our forecasting system to consider the importance of data points in each unique business context, not to apply a one-size-fits-all model to individual business narratives. The system recognizes that not every dimension is important to every business. One of the most critical underpinnings of the system is that as new data points are generated through day-to-day business operations, the system automatically recognizes these changes, learns from them, and re-adjusts its models if necessary. This means that the system is continuously improving its own capabilities by including and adapting its new learnings over time as it produces the rolling forecast.

Our Validation Process

There are a number of ways to validate the efficacy of a forecast. The most important thing to ensure is the predictive power of the model. To verify that our models provide an accurate and usable forecast, we used data from 500 randomly selected data sources and temporarily programmed the forecasting system to assume that the “present” was three months prior. We then trained the models to predict the forthcoming three months based on the historical data available up to that point. Since we already had the actual data from those three “forecasted” months, we were able to compare what actually happened with what our system predicted.

We found that the system correctly predicted data points within the forecast range (between the high and low estimated future scenarios) in nearly every single case. Across those same datasets, we also found that our probable forecast prediction fell within a Mean Absolute Percentage Error (MAPE)* of 20% from the actual recorded data. Considering that we used a randomized sample with no data cleaning, we were excited and admittedly surprised about this high level of accuracy. To better understand why errors occurred, we dug deeper into the cases where these errors arose. We discovered that two conditions were present that had an adverse effect on a forecast’s accuracy: bad data quality and information asymmetry. We believe that by mitigating these conditions, we can bring the MAPE down to the single digit percent domain.

*We used MAPE as a convenience measure to quickly gauge and measure forecast efficacy even though Makridakis (1993) rightly argued that MAPE in time-series forecasting underestimates the accuracy of the forecast results.

Using Bad Data

Through our verification process we found that, when it comes to data, garbage in equals garbage out. Not surprisingly, attempting to produce a data-driven forecast using bad data will inevitably result in a less accurate output, which then leads to poorer decision-making. As Gartner’s Colleen Graham said, “Data quality requires a certain level of sophistication within a company to even understand that it’s a problem.” Before attempting any kind of cash flow forecasting, it’s absolutely critical to make sure that the data you’re using is meaningful and accurate.

Information Asymmetry

One question that comes up is: “How can you predict an incoming bank loan, a sporadic dividend payment, or capital infusion to the business?” And our answer is: we can’t. The system can’t compensate for information asymmetry, which refers to information that one person knows, and another does not, or in our case, something the business owner or advisor knows, and the machine does not. It’s nearly impossible for any system to build an accurate forecast where there are many non-predictable events at play, as the very nature of these events is that they are non-predictable. These events were the major component of the forecast error we experienced above.

We do realize that non-predictable sales, expenditures, or miscellaneous events occur from time to time and therefore need to be taken into consideration for a forecast to be useful. With this in mind, we implemented a feature allowing for the input of manual adjustments that address these situations, and we will be eager to see the probable forecast error results when events that contribute to information asymmetry are taken into account.

To be clear, our intention was never to create a platform for extensive manual scenario modeling. The power of chata.ai’s system lies in preserving the integrity of the real data, the true narrative of the business. If users are particularly keen on manually manipulating assumptions to showcase potential outcomes, there are plenty of other great cash flow forecasting tools that are designed to offer this functionality.

Looking Toward the Future of Cognitive Cash Flow Forecasting

We see potential in our tech integrating well with other forecasting software that offers more flexibility in the manual alteration of assumptions and scenario modeling. Our plan is to eventually expose the system we’ve built to interested developers who can augment their app offerings to include chata.ai’s objective, automated forecasting functionality.

We also believe it’s very important to allow users to gauge the accuracy of the forecasts by comparing them to the actual data. We plan to implement a “forecasting report card” that will show exactly how well the cognitive forecast predicted what would happen in the future, after the fact. Transparency in forecasting is something the forecasting community has not done a good job of and we hope to change that.

In the meantime, we’re looking forward to launching this new feature and seeing how our users take advantage of the system to make an impact for their clients or on their business. Remember CFF is about decision making, and if this process is automated, you have much more time to spend on the decision as opposed to doing the forecast. If you want to test out Cognitive Cash Flow forecasting for yourself, get connected at chata.ai.