Organizations across all industries are embracing AI. Gartner reports an adoption rate of 37% for the technology¹ and the pace is only accelerating.
Much of contemporary AI rests on statistical machine learning, a discipline built around the idea that computers can learn useful things from data with little to no human supervision, domain specific knowledge or custom coding. In more technical terms, machine learning seeks to make predictions about a group of variables given observations about another group after having observed and learned the pattern of joint occurrences of the variables from historical records. Furthermore, the machine learning approach seeks to uncover such relationships irrespective of the mechanism that generated the data. In fact, machine learning is about estimating a mathematical function that mimics the unknown data generating mechanism without the need to custom model the inner workings of it. Given the domain-agnostic nature of machine learning, the tools and algorithms it provides are built to tackle anything, from natural language processing to analyzing images, from maximizing profit and trading stocks to making movie recommendations and so on.
When theory meets the real world, however, the story does not always flow as smoothly. First, despite domain-agnostic promise of machine learning, most people recognize the need for design or selection as well as calibration and fine-tuning of an algorithm for a given end application. More critically, the output of machine learning is often difficult to contextualize, explain and extrapolate, jeopardizing as a result adoption of AI generated recommendations. This is particularly important in the business world and established industries where machine learning is seen as a technology that is set to displace what is perceived as entrenched “tried and true” processes and decision-making paradigms, whether they are about managing relations with suppliers or integrating AI into medical diagnostics. Lastly, an often overlooked aspect of AI is the commitment an organization needs to make and keep to collect and supply on an ongoing basis clean, complete, and relevant data records to data intensive machine learning algorithms. The burden could sometimes mean the failure or abandonment of a project before it even starts or only after a short period of experimentation.
Let’s look at some of these issues in greater detail.
Contextual Adaptation and Predictability in Changing Environments
If we accept that machine learning is about making predictions — a word that in the mind of most people is associated with some event in the future — one may be surprised to learn that existing machine learning algorithms are much better at classifying and discriminating in the present than projecting their learning from the past into the future. Making future predictions does not obviously mean some kind of crystal ball prophesying but instead discovering enough of the context around the data to make reasonable predictions in the future even if the future circumstances are not exactly the same as the historical ones.
Let’s illustrate this by means of a simple example from the business world. Suppose an analytical consultant is tasked with a merchandising assignment for a retailer. As part of this assignment, the analyst is interested in questions about product mixes that sell best along with the right timing (season, time of day, day of week), location (region, store, etc.), and pricing and promotions, often down to SKU levels. Suppose the analyst is given a dataset consisting of offer codes that ran in the historical period, data about their sales and cost pre-and-post-launch, and market testing data for carefully designed test and control populations. Although this kind of data could be more than enough to analyze offers’ past performances, what will it say about the future? For example, how should the analysis be extended to new offers that didn’t even exist in the historical period? Hoping to generalize, one may want to associate the multitude of transient offer codes with a smaller collection of stable attributes and apply machine learning to understand how the attributes drove the performance. This may help but only to some extent. For example, what if the offer to be launched is for a new product? Should the performance of the new offer be compared to past offers within the same category or subcategory? If so based on what factors should the new product parameters such as price elasticity be adjusted, if at all, so as to make valid predictions? How should the results be extended when sales interact strongly with brand, competitive trends and others? How should changes in the supplier network be factored into the analysis? How does all of this role into the bigger strategic picture of the company?
Questions like these are not merely theoretical musings but things that frustrate business users on a daily basis about their available analytics, hurting as a result adoption and use of a wide range of advanced analytic technologies including machine learning and AI.
Broadening the scope of a machine learning project unnecessarily and stuffing the model with all kinds of irrelevant dimensions and attributes is not the answer. The solution should instead be sought in approaches that enable higher levels of reasoning that weave the relevant context into the analysis. Knowledge graphs are among the tools that could potentially be exploited for deep discovery of similarities and relations among seemingly unrelated entities, allowing transfer of learning from the training set in the past to unseen environments in the future.
Technologies that enable contextual adaptation constitute the third wave of AI². Among other interested concerns, the U.S. Defense Advanced Research Projects Agency (DARPA) is actively seeking theories and applications that will support the goal of making machines understand and adapt to context very much like human brain does³.
Evidential Reasoning for Rapid Decision-Making
Most people that have spent time wrestling with real world data agree that data is almost never complete. That is obviously a major issue since data is the fuel for machine learning. In fact, the more advanced the algorithm, the more data hungry it will likely be.
There are many situations where the problem of missing data is encountered. In consumer and political surveys, non-responses or partial responses are quite common. In healthcare, missing data is the norm rather than the exception, given that among many other reasons much of the highly valuable data about a patient is entered manually. In adtech, observations about exposure of marketing touch points to prospective customers are partial at best.
Missing data may be categorized in different ways. They may signal information as in political surveys but might also be the result of random omissions or caused by siloed data configurations.
Even if the data is complete for the purpose of learning, applying the trained model for prediction and decision making may again pose the missing data challenge. For example, while complete data about individuals for the purpose of developing lead scoring models might be available, the decision to assign a score to an individual using a trained model may have to be based on partial behavioral data that is collected cheaply in real-time. Other cases involve variables that may be knowable but unknown within the time scale of decision-making due to delays and costs. For example, functional managers in all organizations often use relatively low cost real-time signals and indicators for day-to-day decisions, rather than relying exclusively on detailed data and reports that would only become available quarterly or annually. In a rapid response situation, critical decisions need to be made in real-time without access to all the relevant data and facts.
To make machines respond quickly as the environment changes, they must be equipped with the ability to reason on the basis of evidential inputs. Unlike complete data, evidence is partial, meaning that it informs on a subset of the input space. Furthermore, evidence may be expressed in terms of probabilities, bounds, or other constraints instead of fixed values.
Multimodal and Multi-Sourced Decision Making
Multimodal AI has received a lot of attention lately. The motivation, like most other fields within AI, is human experience. We see, hear, feel, smell, and taste. While each sensory experience informs about some aspect of the world around us, our brain puts our multimodal experiences together to arrive at a holistic understanding about the environment. Our multimodal experience also helps us fill the gaps in sensory data. For example seeing someone’s lip movements could help us better understand the words spoken.
Much of the recent interest in multimodal AI is about sensory modalities (visual, audio, text, etc.). Here our interest goes beyond to include ways of combining and reasoning on knowledge acquired from multiple sources without resorting to raw data behind the information provided by each source. There are many examples in business world why multimodality as defined above is relevant and in fact crucial. A functional manager might have access to multiple indicators, surveys, and maybe even machine learning output built on those and other data assets, each using raw data that are collected at different scales, granularity, time periods and time-resolutions, each telling a different or even contradictory story about fundamentally interrelated aspects of the business.
Attempting to combine all the raw data behind every single observation into one unified machine learning model will be a futile exercise given the heterogeneity of the data and intricacy of the relations among them. Instead a deep architecture may be conceived with separate machine learning models residing within each source and the output of each source weighted and combined adaptively to make predictions about an outcome of interest. For example a merchandising solution may consist of multiple machine learning modules tied together, each module specializing in one particular aspect of the problem (product use, trends, supply, demand, etc.).
Machine learning and AI have seen many defining moments and transitions over the decades. For the technology to see a wider range of adoption in the business world, it is no longer enough to learn patterns in the data from the past to make classifications in the present but rather discover enough of the context around the data to project the learned patterns into the future under conditions that may not have been met in historical records. Additionally, as AI technologies continue to be embedded in real-time decision making and human-machine interfaces (e.g. chatbots), the ability to reason based on evidential inputs will prove crucial. Lastly, machine learning problems can rarely be isolated from their contextual and environmental data, which are often available through external sources such as public knowledge graphs. To utilize such information effectively, it will not be practical to simply combine all potentially pertinent raw data but instead exploit a deep architecture using a multi-stage multi-modal reasoning framework.
While fully functional systems that meet the challenges enumerated above are yet to materialize, the enabling technologies and building blocks are actively investigated by multi-disciplinary teams and many of them are being used in various applications and industries.
: Gartner Survey of More Than 3,000 CIOs Reveals That Enterprises Are Entering the Third Era of IT. https://www.gartner.com/en/newsroom/press-releases/2018-10-16-gartner-survey-of-more-than-3000-cios-reveals-that-enterprises-are-entering-the-third-era-of-it
: Third Wave AI: The Coming Revolution in Artificial Intelligence. https://medium.com/@scott_jones/third-wave-ai-the-coming-revolution-in-artificial-intelligence-1ffd4784b79e
: DARPA Announces $2 Billion Campaign to Develop Next Wave of AI Technologies. https://www.darpa.mil/news-events/2018-09-07