Scalable and Repeatable Modeling for Decision Making
My last post ended a paragraph with the following sentence: In many ways the value has been extracted by getting the analysis and decision making out of Excel and into repeatable and scalable models — more about that in a future post. Here’s that future post.
An important part of doing any task is having the right tool(s) to complete the task well. Have you ever tried using a hammer when you really needed a screwdriver? It’s quite demoralizing. Or heard the saying “Everything looks like a nail when all you have is a hammer”? I think Alanis Morissette was onto something when she sang “It’s like ten thousand spoons when all you need is a knife”.
In today’s environment managing, transforming, understanding, and gaining insights with data is of utmost importance. Making data driven decisions has become the new currency in the current competitive marketplace.
I’m about to go into a rant against Microsoft Excel and this should stand as a disclaimer: Excel is one of the most robust, ubiquitous, and useful tools we have, but it’s a hammer when sometimes you need a screwdriver or ten thousands spoons when you need a knife. This isn’t meant to be a hit-piece on Excel but I do want to strongly point out the importance of having a wide set of tools in the tool box, more than just a 10,000 spoons (Excel) when all you need is a knife.
When I wrote about the importance of getting analysis and decision making into repeatable and scalable models and out of Excel, I chose my words very carefully — Repeatable and Scalable.
In my experience scalability is very important and probably the easier of the two to define. I’ve run into too many modeling scenarios or analysis needs that exceeded the row-limits of Excel. For example trying to pull in some sort of order history or transactional data over a large period of time can quickly outpunch the available 1.04 million rows Excel supports. Excel simply doesn’t allow for the scalability that’s required in the demanding environments of today. On top of the row-limit, the amount of data within the rows is growing. So instead of some simple customer data as an attribute to information about a set of orders, you now can begin to build large data structures of rich attributes that can be hard to crunch and consume in Excel.
The word Repeatable is defined as ‘suitable to be done again’. As humans we are creatures of habit and we need to be able to repeat tasks or analysis as the environment changes around us. In my last post I talked about how I worked with a team to create a tool to assist with determining inventory settings, particularly around safety stock and replenishment parameters. The tool we created could have been done in Excel…but it would have been clunky for a couple of reasons. First, Excel would have had trouble pulling all the related information together and would have met computational limitations with the amount of data paired with the complexity of formulas we’ve become accustomed to writing. Second, Excel is largely freeform and there’s no strict formatting or table structure requirements when creating a tool. So if you plan to replicate pulling and dumping from source systems to update inputs, Excel won’t provide the integrity that is required for such a task. Extract, transform, load (ETL) is one of the most important tasks in the process of data analysis. For the inventory application, instead of creating an Excel tool we utilized a platform and one of the key benefits of using a platform is the integrity of formatting and data governance. There are checks in place when uploading data into a platform:
- Datatype integrity — integers where you would expect integers and alphanumeric where you would expected alphanumeric and not visa versa. No special characters that can cause algorithms to trip up.
- Data/table structure integrity — the correct number of columns in the input table. If you had the table set for 12 columns and you tried to upload 13 there would be an error message.
In my experience having a platform or a tool beyond Excel to create scalable and repeatable modeling is a must. Python is likely the most versatile option for a variety of applications or uses. It will lead to more robust and accurate solutions. In turn, decisions should become more data informed and therefore results should be more optimal. Make sure you have more than 10,000 spoons in the cupboard, always keep a knife, heck maybe even a few sporks. It’s important to have different tools in the toolbox in today’s demanding environments — a diverse toolset is vital.