Moving AI from PoC Stage to Production
After several AI projects, I realized that many PoCs fail to reach the production stage and only a few make it to the release stage.
In 2019, many companies have begun applying AI solutions with impressive results but only a few have developed full-scale AI capabilities that bring a real added value to the company.
Based on my experience, less than 20% of ML PoCs projects make it to production and if they succeed, an important number of them might fail during the “industrialization” of your AI solution.
Most companies start by first proving that an AI solution will, in fact, cut costs, improve customer experience, or in some way be a differentiator for the business through a proof of concept (POC).
POC’s are usually carried out on rather simple algorithms using immediately available training data or internally labeled data. The main goal is to show that an algorithm can be trained to address a particular use case with a small amount of training data.
If they succeed, the project continues towards the production stage.
The production stage represents a higher level of complexity for your AI project. Indeed, you are no more trying to prove that the solution works but that it can integrate within the company infrastructure and perform well in real-life conditions.
To be successful, ML projects need to think big from the start by taking into consideration the company structure, customers, company size and internal workflows.
I noticed that PoCs tend to hit infrastructure performance, knowledge, data management hurdles that prevent them from reaching the next stage — production. The production step seems to be underestimated during an AI project. In this stage, it is not rare to completely change the way your system works. It is also safe to assume that new problems will arise as we are getting closer to the final release of the solution.
Moving towards the final stage of AI integration, where AI is integrated across multiple lines of business and perhaps made available to the average user/customer, require enterprise-scale infrastructure, security and support challenges.
Production: a system that is being used in real life. It’s not a POC to test whether or not something will work or an experiment performed on sample data. It’s real-life data being used to solve a real-life problem.
Many times, I have seen AI vendors failing to prove the concept as originally conceived. But why the transition seems to be such a nightmare for AI projects? Well, most of the time, AI projects don’t even get to the production stage because of these elements:
I noticed that a company can have a perfect business issue that AI can address but faced with data issues or new necessary workflows, they might give up on the project. Indeed, organizations will have to navigate a series of issues related to software, data security and a quantity of new data for training before even reaching the production stage.
Another aspect would be to underestimate the cost of building a real-scale functioning AI. There’s a lot more investment needed to get a prototype into production! The management needs to make sure they will be able to afford to pay it.
Your ML proof-of-concept is the first step of a long process. You need to look beyond it to issues that arise when you scale it to a real-scale production system .
Why AI PoCs fail
The proof of concept roadmap for an AI project poses several challenges. From the lack of data, legal issues, internal fear against AI-enabled applications or the capacity to integrate, companies have to analyze several factors before putting the models in production.
In my opinion, organizations should invest in many proofs of concept because they can relatively learn about their potential, improve their data culture, quickly end AI PoCs that aren’t going anywhere, and identify the most promising approaches to continue monitoring and investing resources in. I have seen companies trying to earn money with their first PoC and choosing a complex problem to solve through ML… that is the best way to fail!
Companies should also take into consideration that the skills needed to make a proof of concept are very different from the skills needed to scale an idea to production. It is obvious but without a structure to support AI integration, even the best projects will die.
An AI project needs to be supported by the management and without an appetite for long-term investment, AI application will never reach any meaningful level of scale or usefulness. It takes time and patience to successfully develop such projects.
In order to make your PoC a success, it is mandatory to conduct extensive research, build a cross-functional team and investigate and test a wide range of hardware specifications. It might also be interesting to consult external experts to fine-tune the model. While we managed to come up with the prototype within 2–3 weeks of initial research, the next steps will take much longer and will require significant investments in terms of money and time.
Based on my experience, a good PoC takes about 1/2 months. Indeed, the data gathering process is actually very time-consuming. Needless to say that most companies have amazing ideas about how to use AI but don’t have the right data…
For example, if during the POC the algorithm demonstrated its ability to recognize faces photographed in the same light, at the same distance and angle, during the pilot the algorithm will need to be exposed to variations in lighting, distance, angle, skin tone, gender, and more. Which implies, of course, way more data.
The difference between generating the input data needed by a machine learning model for a proof of concept and doing it continuously and at scale is important. This aspect is often underestimated.
After several projects in which I worked with disparate and imperfect datasets, I have come to the conclusion that people trying to move desktop-scale ML algorithms into production are likely to massively underestimate the time and energy required to get the data needed by ML algorithms into a form where they can use it.
It is key to minimize the “gap” between your real-world requirements and the POC data set. As a consequence, I strongly recommend using real-world data.
It takes time to build a solid and relevant dataset. There are specific processes that must be followed to generate data that meets the standards needed to properly train a predictive model.
When the PoC is successful, some AI teams can consider that they can do the data training preparation for the whole project by themselves. I believe, they underestimate how hard it will be for the company to deliver the required data (work silos, slow organizations, etc.). At this step, we usually start to understand how the company works.
Indeed, training the algorithm for the many additional use cases that must be part of a production system creates a demand for dramatically, often overwhelmingly, larger amounts of data.
The Pilot phase
A successful POC will convince the project owner to deliver funds for a pilot phase. The pilot is a project step between the POC and the project in production; organizations don’t turn off any other systems or change their staffing. The pilot runs alongside existing systems as adjustments are made to the algorithm as it is trained. It is a necessary step since many issues or workflow challenges will appear during the process.
I noticed that once the PoC is confirmed, the task becomes way more complex for the development team. Indeed, it becomes key to integrate more and more data coming from different sources. The other challenge is about the integration of the AI solution within the technical infrastructure of an organization as well as adapting current workflows to the new solution.
The production’s duration is somehow determined by the level of AI accuracy that is required for production. Obviously, some projects will require more time to reach a certain maturation level (self-driving vehicles, etc.). Other projects can show positive ROI at significantly lower levels of confidence.
Most of the time, the pilot does not have diversified enough data to work on a grand scale.
From PoC to production
As we said, very few project teams will manage to move past this stage and continue. Indeed, most projects will require vastly different resources to evolve from the PoC to production. During the production step, it often becomes clear that the project will require more time to be fully operational as we discover new possible issues. Moreover, the more we involve end users, the more we realize PoCs are far from reality.
In general, projects should start in small, agile teams that would involve both end users, c-level project owners, and data scientists. But once the validity and feasibility of the idea are confirmed, the institution should be fully committed to investing the required resources to see the project through to production.
All forms of data modeling for a PoC have to simplify and yet represent reality, and in the process, some authenticity is always lost. This creates risks for machine learning, as real-world data may be more prone to modeling issues than the training data used for the Proof of Concept (POC).
The obvious solution to this is to add more detail to the model and have more fields, tables, relationships, etc. But the more detailed you make the model, the harder it is to work with and understand. This also assumes that you can get more data. I’ve seen many projects fail because of a lack of data and others in which we used data augmentation techniques (in image recognition projects, a good way to reduce overfitting).
Deploying regular software applications is hard — but when that software is based on Machine Learning, it’s worse! Machine Learning has a few unique features that make deploying it at scale harder.
In my latest project, I realized that once your algorithms are trained, they’re not used consistently. For instance, your customers/end-users will only call them when they need them.
This idea of managing API calls is key. Indeed, you need to make sure not to pay for servers you don’t need. As soon as the running costs of your AI solution become too high, the company might become scared to continue with it.
As you move into production, things will start to get a bit more complicated from a data perspective. Unless the algorithm’s problem space is very simple or completely static, training will never end. Problem spaces evolve. New use cases evolve. And pressure from competitors who are also trying to create differentiation from ML means that organizations have to expose their models to evermore obscure specific cases. And at already-high levels of model confidence, each 1% incremental increase is staggeringly expensive in terms of training data.
AI integration in Information Systems
The AI solution might be ready but you are not done yet. Indeed, real-scale implementation also involves interfacing AI with production information systems and architectures. Through my experience, I came to the conclusion that of the biggest issues in AI deployments is the difficulty to integrate cognitive projects with existing processes and systems. The best option is to use machine learning models written as APIs or as program code modules within existing systems.
When it comes to AI projects, PoCs are very helpful whether they are successful or not. I believe that through a special process (data analysis, company adaptation, etc.), we could potentially increase the chance to see a PoC reach the production stage.
Companies need to prepare for working with AI vendors in ways that do not put their data at risk or create long-term dependencies but instead strengthen competitive advantage.