Same as Six-Sigma?
A lot of folks in the industry ask this question — Haven’t we done this before After all, we have had Six-Sigma, Lean, etc. for decades now. These movements had some similarities to Data Science, e.g., we turned business problems into mathematical or statistical problems and solved using a rigorous approach instead of relying purely on experience or ad-hoc experiments and inferences. Six-Sigma and Lean were by far limited to manufacturing, while spillovers to other areas like Lean Marketing, Lean Finance, Lean Supply Chain, were also seen. In any case, data was collected (usually manually), all number crunching was done at desktops by analysts, and results were presented to business as recommendations and implemented by the business functions.
Looking Back at ERP, APS, and BI
ERPs were the first ‘Enterprise Apps’ on the scene. They manage all transactions in the enterprise and make sure all transactions are in sync, e.g., when the material is received at a warehouse from a supplier, transactions are created for logistics, inventory, accounting, etc. and all of these are consistent.
Advanced Planning and Scheduling Applications (APS) were the first “data science” applications, pioneered by software companies like Numertix, i2, Manugistics, etc. They used to crunch millions of variables using optimization routines and produced optimal plans for various planning problems like those in Manufacturing, Logistics, etc. Slowly, such models got extended to almost every area of business such as Yield Management, Revenue Management, etc.
Together, ERPs and other enterprise applications created vast amounts of data. Enterprises realized that by slicing and dicing this data, especially with cross-correlations across silos, they could generate a lot of intelligence. Tools that made this process of warehousing, slicing-dicing, and reporting on this data were coined Business Intelligence (BI) tools.
Why Wasn’t This Enough?
We had Transactional, Planning, and BI tools, all solving specific problems. However, there was a growing feeling that these were inadequate as (1) the volume, velocity, and type of data started increasing exponentially, e.g., as enterprises got into mobile customer apps, digital marketing and (2) the problems that businesses wanted to solve started becoming less and less standard.
Enterprises have struggled (and continue to do so) with implementing various special-purpose planning systems. The failing with this approach was that the business problems were too dynamic and would change quickly with time, with new aspects of problems revealing themselves often. As an example, companies that wanted to do forecasting would freeze on an enterprise forecasting methodology and approach (and implement it into an application/system), only to find that certain products, departments, lifecycle stages, etc. would be better served by separate, and very different approaches to forecasting. One-size-fits-all just wouldn’t work, and this led to problems of tool proliferation and bloated, high maintenance software.
And Then Came the Opportunity
Three big things happened in the last decade that opened a new approach for building data-driven applications
- On-Demand Infrastructure, i.e., cheaper storage and fast processing on cloud
- Advances in Statistical Learning and Computational Solutions
- Open-Source Software for Data Science
With the cloud, companies no longer have to buy expensive computers. All they need to do is rent the machines they need for the duration of use, and pay for it. This trend led to a lot of experimentation that would otherwise not have happened, e.g., bigger volumes of data could be processed on big-data/Hadoop clusters that can merely be rented.
In the last ten years, computational algorithms like Random Forests and Deep Neural Networks became computationally tractable, driven both by newer algorithms as well as higher computing power. This enabled problems previously considered to be huge, to be solved in a short time.
The final piece of the puzzle, and probably the most important one, was the fact that all these computing and data science advances became accessible to academia, freelancers, and companies small and big, for free — through robust open source software. Many companies like Google and Microsoft now routinely release machine learning tools as open source. As a comparison, such tools (e.g., SAS) were out of reach for all except large companies due to extremely high software costs, of the order of thousands of Dollars per seat license.
In essence, the above developments dramatically reduced the cost for companies to be in the business of developing software applications. Small teams and businesses can now access the same or better tools like the ones large software companies have, and this led to a significant wave of innovation.
So, What Can Data Science Do?
The above developments collectively have created a big opportunity for Data Science. Leading data science teams in companies have demonstrated that:
- Enterprises have numerous problems that can be solved for cheap, and at scale, using Data Science. Enterprise Applications are not always the solution, and may many times become part of the problem rather than the solution
- A dedicated team of non-domain-specialists, aka ‘data scientists,’ can solve a broad set of problems within the company, by carefully mining the mountains of data that already exists
- There is a great benefit to solving such problems in-house than merely outsourcing this process. The process is as valuable as the results from the process, primarily because it spawns and amplifies innovation internally
- The technology investment required to embark on this journey is far less than it has ever been in the history of technology. Tools are getting faster, better, and cheaper by the day
It is beyond the scope of this post to exhaustively list applications of data science. However, it is important to mention that data science finds numerous applications in all areas of business, e.g., marketing, manufacturing, CRM, demand planning, finance, etc. and tackles problems that use not only numerical data but also image, text, voice and video data. The applications are limited just by the scope of activities of the concerned business.
Take Away
Depicted below is a schematic from the industry research group Gartner that explains the range of questions that can be answered by data science, in the form of a maturity model. It is crucial for companies to realize that your data science team can use the mountains of data sitting inside the four walls of your company, and even leverage external data to answer critical questions for your business and develop new capabilities, in virtually every area of your business. Many companies have realized this tectonic shift in the industry, and are mandating data science awareness at the least, across every department and layer of their business.
In our next post, Data Science — The Opportunity for Enterprises, we pick up the thread from here and discuss why enterprises have such a tough time embracing data science.
Author: Ananth Krishnamoorthy
Please do write to us with your views and comments. If you are a company/startup looking for help with machine learning, we’d be more than happy to help. Just drop us a line and we’ll get back.