7 Reasons Why Most AI Projects Never Make It to Production
Part one of two
Market research points to a steady growth in the share of AI projects being deployed to production. While Deloitte in 2019 still finds 91% failing to meet expectations, McKinsey sees only 64% not continuing beyond the pilot stage in 2021 and Gartner only 46% in 2022.
This is in line with my experience as an AI researcher for several years and then as a consultant for a specialized AI service company in the past five years. The industry is maturing with ever more applications actually being implemented and used. The absolute numbers seem overly optimistic though, even for recent years.
In my experience, at least 80% of AI projects are stopped before they reach production. So when I see the above numbers, my first reaction is: the success rates are inflated! And they probably are. One reason may be that the estimations are based on self-report (surveys). Enthusiasts tend to respond to surveys more frequently than skeptics and there is an overall tendency towards desirable answers. On the other hand, ‘pilot’ is also a later stage than Proof-of-Concept (PoC) where I start counting which means I start from a broader base.
Anyhow, the rest of this blog post is not about the number itself but about the reasons why AI projects fail. Because of confidentiality, I cannot always go into detail, but I do for each provide context and potential solutions. This first part will deal with four reasons while the second will present three more and a general discussion.
- Because they are not supposed to
- Wrong use case
- Data is hard
- ‘Maybe it’s too early’
[Part two can be found here]
1. Because they are not supposed to
AI projects are more similar to business initiatives than to IT implementations. In a sense they are more like a sales campaign than an implementation of, say, a new CRM system. This is because they come with multiple levels of uncertainty (data, algorithm, user acceptance…). Even after careful study and consideration, you are never entirely sure in advance what will work and what will not.
For this reason, it is advisable to start with an experimentation phase during which, in rapid succession, ideas are conceived, tested and proof-of-concept demonstrators built for the most promising cases. 90% of ideas should probably already have been killed before the PoC stage. Typically a PoC should cost no more than 10–20 engineering days and the goal is to test a hypothesis, typically that this or that result can be achieved by model X trained on data Y. A simple Jupyter Notebook is often enough at this stage as energy should be focused on metrics and performance rather than looks or user-friendliness.
The main goal in this phase is to learn which use cases and approaches promise the highest benefit-cost ratio. This means that the intermediary results should look highly promising while the margin of improvement is still high at relatively low projected cost. Ideally most (over 50%) of these PoCs are expended with 80% not being uncommon. Failure is good at this stage! It does not mean that money is lost but rather that knowledge is gained and that a future, much larger investment can be more accurately targeted towards higher return on investment. At this early stage, 50-80% of projects should be killed, but many executives and managers do not have the nerve for this and will over-commit and scale up prematurely.
Solution: Accept it! This is why you build PoCs. It’s a learning and selection process. Try not to over-commit too early.
2. Wrong use case
AI is not a solution to every problem. In fact, there are clear limitations as to what AI can do and what it cannot. Usually a very bad starting point for a project is when a client comes to you with a problem they cannot solve because there are too many variables involved or there is too much variability, and they hope that AI will be able to pull it off. There are certain use cases such as recommender systems and various estimators which indeed fit this description, but the large majority of successful AI use cases emulate human performance rather than surpass it. A thorough understanding of a problem and how to solve it (without AI) is usually a prerequisite for a successful AI project.
Another common problem is that the use case does not allow for any margin of error. AI systems are probabilistic in that they perform tasks with increasing performance as they improve, but in many cases they will never reach a 100% success rate. Additionally, many AI algorithms are not deterministic so you really cannot tell when they are wrong. Hence, ideally the use case that you are tackling allows for a margin of error which often means that it the system provides value for the user even when not all results are correct. This is why in many cases, aiming for a human-in-the-loop system before full-automation is a good idea.
A third problem I have come across is that the chosen use case is too marginal to the actual business of a client. Often, under pressure from shareholders or management, the decision is made that “we should do something with AI because it is the future,” but there is no clear strategy or commitment. In these cases, some marginal process is selected, far from the core business of the company, and a project is initialized. AI projects do not come cheap though, both in terms of financial and strategic investment. Mostly, in these cases, when the PoC is successful, it becomes clear to the main client stakeholder that it will still be hard to gather enough support within the organization to push the project forward as the benefit is deemed too limited compared to the investment and / or the changes in practices to be made.
Solution: Inform yourself before you commit to a use case. Start with a broad analysis in which you consider many options and involve someone with the right expertise in the choice process. Make sure they are impartial as to the outcome of the decision process though. Tech enthusiasts looking forward to working on the project may paint too rosy a picture of the potential hurdles ahead.
3. Data is hard
This used to be by far the number one reason why AI projects did not realize their potential until recently when large, pretrained foundation models started to be used more frequently. A typical scenario would have been that a client has a feeling that they are sitting on a huge pile of unique data which can be used to develop AI algorithms. Without proper assessment, in my experience, this feeling tends to be wrong more often than right however.
Uncurated real-world data is usually junk. Take metadata: when there is human intervention involved and there is no direct incentive to provide precise data, you should assume that all real-world data is incomplete and incorrect. This is rational. In most jobs, people are under pressure to generate value as efficiently as possible so they will take any shortcut that is available to them. And they are applauded for it. Very often this means that data is collected and labeled just good enough for the immediate task at hand and general bookkeeping is done minimally at best.
Also, AI algorithms are picky in their data and not always in the way that a non-expert would expect. First of all, they hate missings so when your tables have many gaps in them, you can be sure that your model will complain. Actually, in most cases you will either lose the data points with missing elements or you will have to fill in the missing values based on some heuristic. Both of these hacks will reduce the performance of your AI model.
Another issue is that AI algorithms want their data to be representative and therefore diverse. You cannot train an object detection model on images from a catalog when it is meant to be used, say, on the factory floor. The world is a complicated place and recognizing objects in an image depends on light, distance, camera, angle, etc. This means that for many applications, to reach adequate performance, you will need data points for all possible combinations. This is why training sets are often huge. It is not the size itself that is important but the degree to which it covers all real-world scenarios. This often implies that existing datasets need to be expanded for AI to reach the desired performance or that entirely new training sets need to be created.
Solution: Accept that you will have to get your hands dirty and that data work often constitutes 80–90% of effort in a custom AI project. Data needs to be collected, cleaned, augmented, enriched, labeled and tested. In some cases, data can be purchased but it usually does not come cheap and it can have implications for the ownership and running cost of your solution.
Finally, it should be noted that, with the rise of foundation models, the need for huge training datasets has become less apparent for some use cases. Foundation models like GPT or Stable Diffusion have been pre-trained on billions of data points (images, documents etc.) and this knowledge can account for a lot of real-world variability. A pre-trained object detection model already knows what, say, a car looks like in different real-world contexts so training it to recognize a specific, new type of car will require many fewer examples.
4. ‘Maybe it’s too early’
This may seem an odd one, but it is one that occurs more often than you would think and one that is not always recognized as such. Some AI enthusiasts cannot stop talking about how amazing the development of AI has been in the past few years, and how fast everything is evolving. (I should know because I am one.) Enthusiasm is generally a good thing to get things done and to convince people to go through with a project. Sometimes, however, stressing that AI models are ever more rapidly growing more powerful can have the adverse effect that a client may just think: “Why should I bother building this solution now if tomorrow’s models will be able to do this out of the box?” This First Mover Fear (FMF) is kind of the opposite of the more well-known Fear of Missing Out (FoMO). You do not want to be the first and then be overrun by the crowd using a cheaper, more mainstream solution.
A typical scenario would be that, together with the client, a use case is identified with a high benefit-cost ratio and a PoC has been developed. The results of the PoC look highly promising with already adequate performance in many areas but not all. As a next step, additional data for the edge cases would have to be collected, labeled and further analyses conducted. ‘Operations’ are not very happy with the project though because they find that they have enough work as it is. Moreover, it is hard to pinpoint exactly how much data will be needed to get the required performance and therefore more than one additional iteration may be needed. This is the moment when some managers and/or sponsors get nervous and look to take the easy way out. Rather than take the risk of cost overruns, they will point to the learnings as ‘useful insights for improving our data strategy’. And they will wait for future, more powerful models to reach the required performance more easily.
Solution: Don’t be a chicken and finish what you have started when the results look promising! The fact that you are investigating a custom solution means that you have the potential and the budget to get ahead of the pack. This means that it should be worth the investment to bring the project to a close. Otherwise it was probably not worth starting it in the first place and it is better to try to be a fast follower than a leader.