A CXO’s biggest challenges in a Machine Learning project
Written by Nisha Shoukath - November 27, 2018
Recently, I met with the Chief Data Officer of a successful HealthTech company at an ML conference in New York. We sat through some very interesting talks and ML solutions by great speakers. Out of curiosity, I asked him what challenges they typically face while doing a new Machine Learning project . Not surprisingly, here is what I discovered:
#1. Data management is complex
Most of their systems are not connected. Data sources are many, and data is stored in various formats across multiple databases. When the scientists asks for data, most of the time, the tech teams are not aware of the sources and even it it exists from the past. Lot of resources are deployed behind such tasks and it takes too much time and back and forth cycles to get a decent data set. It is not a surprise that this data gathering and preparation (repeatedly) ends up being the most longest and expensive cycles of all ML implementation steps.
#2. Delays & Time-to-Insight
From the time the project is kicked off, it just takes too much time to get any actionable insights for the business. Business folks are used to getting agile deliveries from tech teams, and they expect the same from AI teams as well. But, there is a long cycle between set up, data preparation, model definitions, tests etc and by the time they are able to present any insights to business, it is already too late.
#3. Infrastructure cost
Scalable infrastructure is needed to implement and run AI models. So there is quite an up-front investment needed to make this happen. As more and more models are tested and re-tested, this cost tends to shoot up. Most often, it is not easy to justify this cost, especially when there are spikes.
#4. Expensive resources
Well, it is known that ML requires data scientists, and they are expensive. Apart from the need for such specialized SMEs, the nascent nature of ML projects combined with the inexperience, and the non-existence of any good unified platforms for ML workflows also creates the need for several engineers to build the platform, data management and release management.
#5. Poor visibility
It is hard to gain any real visibility into the progress till near the very end, to ascertain if the project is really moving in the right direction. Usually it is a ‘Black box’ from start to finish as there is no single system or a portal that can give visibility into the progress of the activities, quality indicators, or performance indicators that helps to govern the project, assess the risks and to help provide realistic updates to the sponsors and business stakeholders. Lack of visibility does not allow to ‘fail fast’ and adjust without going “back to the drawing board” every time.
#6. Build vs. Buy conundrum
There is a push from the executive sponsorship to get early ROI, even if it means going out and get on-boarded onto ready-made solutions. Alas, ML is only a means to an end, i.e the real goal is a desired business outcome (feature) to be powered by ML. ML itself is not an outcome for business. They want to be an early entrant into the ML space, and realize fast business outcomes, currently they have no choice but to build expensive foundations, as there is a lack of industrial, ready-made solutions that aid easy automation of ML workflows.
He mentioned that at least 80% of the project effort is going towards non-core activities like setting up infrastructure, data preparation and technical foundations than towards core data science and algorithms to solve the actual business problem. Hopefully this will change in the next decade when there are better ready-to-use platforms available.
My final thoughts when I left the conversation was that there is definitely a gap in the market for an enterprise grade platform that will help automate machine learning workflows seamlessly, so that the focus and investments can be channelled towards solving the core business problems and not building technical foundations.