**Deep Probabilistic Decision Machines for Building a Causally Generative Process Model-based Action Control in Enterprise AI**

*by Chief Scientist, AI and Data Science, **Hyungil Ahn**, **Ph.D**. and the AI R&D team*

There are many machine learning libraries and automated tools that might be applied to enterprise AI problems. We find that these tools tend to support only the “associative” AI problem setups such as regression, classification, and autoregressive time-series modeling¹. The typical uses of “associative” AI might be still effective in simple diagnostic or predictive contexts where we want to use probabilistic associations like correlations between observed features and targets (e.g., When we observe variable X, how likely we would be to observe variable Y?), but they have significant limitations in making high-value solutions in Enterprise AI® where we want to make optimal decisions and controls in the “causal” context of complex dynamic processes (e.g., What would happen to output Y, if we do action A in observed condition C and probabilistic latent state Z of the process²?)

**Simplified illustrations:**

**Associative AI**

For a given distribution center and SKU, when we observed ** Y(last)%** and

**of lost sales (due to insufficient stock) on the last month and this month, what tended to be the most likely value**

*Y(this)%***of lost sales next month?**

*Y(next)%***Causal AI**

For a given distribution center and SKU, when we observed ** D(last)** and

**of demand,**

*D(this)***and**

*I(last)***of inventory,**

*I(this)***and**

*Y(last)%***of lost sales,**

*Y(this)%***and**

*A(last)%***of shipment allocations from the hub on the last month and this month, what will be**

*A(this)%***of demand,**

*D(next)***of inventory and**

*I(next)***of lost sales next month if we change shipment allocation to**

*Y(next)%***on the next month?**

*A(next)%**What can be a key representative problem framing in enterprise AI?*

We can generalize a variety of problems in enterprise AI as building generative process models based on historical data from actual dynamic processes, then optimize actions in diverse conditions over the processes by simulating measurement outputs. The key representative setup for this dynamic process simulation and action planning typically involves causal interactions between the following four sets of variables:

*Actions*: controllable inputs, planned under operating constraints (e.g., safety stock thresholds, planned receipts)*Conditions*: uncontrollable inputs, determined by the process context and previous action selections (e.g., DC locations or lead times)*Sensors*: observable but not directly controllable measurement outputs, predicted by the generative process model (e.g., customer demand, inventory levels, or actual receipts)*Targets*: observable but not directly controllable measurement outputs associated with KPIs or the optimization objective, predicted by the generative process model and optimized (e.g., lost sales, inventory holding costs, expedite costs, or value at risk)

We have developed universal deep probabilistic decision-making methods and code libraries for the dynamic process modeling and control optimization, called the “Deep Probabilistic Decision Machines (DPDM).” High-value solutions in enterprise AI want us to make **“causally actionable” models of dynamic processes **involving the causal interactions of actions, conditions, sensor outcomes and target KPIs. The DPDM unifies **deep + probabilistic + reinforcement learning **to build generative process models from the historical data & make causal predictive simulation-based optimal decisions in enterprise AI on the **horizontal framework**.

Before explaining the DPDM in detail, we will first discuss some limitations and mistakes of popular data science practices, such as regression and autoregressive time-series modeling when developing enterprise AI applications. Then, we will broadly illustrate some application examples of the DPDM in enterprise AI.

*Simple regressions or classifications fail in solving the key representative enterprise AI problem*

Regression or classification setups typically prepare features (predicting variables) and targets (predicted variables) for each sample observation. For dynamic variables, we usually summarize or aggregate temporally changing values over the time period associated with the sample observation, transforming dynamic prediction problems into static ones.

A common mistake with regression or classification setups is a causal violation that happens when sensor measurement outputs are used as features in prediction tasks, in addition to actual causal inputs (action or condition inputs). This is problematic because those measurement outputs are not causes of change in target outputs but are merely associated with them.

For example, suppose that an AI system optimizes the mechanical qualities (target output variables) of produced coils in a steel mill by controlling action input variables such as set-point coiling temperatures for coils in each product category condition (e.g., different grades, hot/cold rolled). There are also sensor output variables such as actual-measured coiling temperatures and different zone temperatures over the process line. The “static” regression setup for predicting the mechanical qualities might mistakenly use as features the aggregated values of all sensor output variables and action input variables. That is, it does not properly distinguish sensor output variables from action and condition input variables, *using all of those variables as features to be fit and associated with the target variables*.

Even for a regression model with high fitness under this setup, this is not a causally actionable model due to the use of sensor output variables as features. First, sensor output variables are not available at the time of the target prediction, unless these are separately predicted earlier. Second, sensor output variables tend to be multi-collinear or correlated with other inputs such as action variables, so it’s likely that the regression incorporating both sensor outputs and action inputs would underweight action variables that are actually causal. Third, there might be also a multi-collinearity issue among action and condition input variables, which requires compression or dimensionality reduction. Fourth, it is often true that a regression model without sensor outcomes as features has low R-squared fitness and not very predictive. How can we compress the correlated action and condition input variables into key hidden features (or latent state) and also involve sensor output variables to learn our predictive model but not violate the causal relationship?

*Static DPDM can achieve causally plausible predictions and decisions in open-loop situations*

Since our main interests are causally plausible predictions and decisions that ask for causal simulations with hypothetical actions and conditions, we should not involve sensor output variables as features in the regression setup. Our open loop version of DPDM (simply called “Static DPDM”) provides a theoretically grounded solution to build a causally plausible model in the open-loop decision situation for sample observations involving action inputs, condition inputs, sensor outputs and target outputs. Static DPDM is a deep probabilistic latent variable model that generates (or predicts) both sensor and target outputs based on action and condition inputs. Based on the deep generative modeling approach, we compress multi-collinear variables into probabilistic latent variables (Bayesian prior and posterior states). The prior state is represented based on only action and condition inputs, whereas the posterior state is on all inputs and outputs. Also, probabilistic latent variables capture the model uncertainty when the given inputs are not sufficient to estimate the underlying process state and predict the outputs (e.g., some input/output variables are missing in a partially-observed environment, the input/output data are not enough, and/or there are inherent stochastic process noises affecting outputs).

We predictively generate not only the target outputs but also all sensor outputs. This way enables the latent representation to exploit the causal relationship between inputs (conditions and actions) and all outputs of sensors and targets. In the model training phase, sensor outputs serve as additional labeled data to be predicted besides target outputs. We train the prior latent state function taking only actions and conditions into account, in addition to the posterior latent state function based on all actions, conditions, sensors and targets. The prior and posterior latent states are constrained to be similar. That is, the prior latent state distribution for given actions and conditions is similar to the average of posterior latent state distributions of sample observations with those actions and conditions. This guides the prior latent state (i.e., the process state estimated before observing actual outputs) to contain the key compressed information for predicting outputs for any given action and condition combinations in the model testing phase.

Once the deep probabilistic (DP) process model is trained, we can do model-based simulations to train an optimal action policy which selects the best actions to optimize key performance indicators (KPIs) for any given condition context.

*Autoregressive time-series models are not the best in dynamic predictions and decisions*

When all variables are dynamic and sequentially provided over each sample sequence that occurs over multiple time steps, a popular machine learning setup is autoregressive time-series modeling; where predicted outputs (both targets and sensors) on the next time step are the modeled function of all observed inputs and outputs (actions, conditions, sensors and targets) on all current and past time steps.

A lot of enterprise AI problems require optimal action planning over a long future horizon (not a single next time step). That is, we want to optimize the total cumulative KPIs derived from predicted target outputs over multiple steps in the future. This setup can be viewed as a sequence-to-sequence framework.

Note that we predict not only the targets but also sensors. This is because most problems require predicting target outputs over multiple time steps into the future, so the modeled function should be iteratively run relying on previously predicted sensors and targets. To predict targets and sensors for any planned actions and given conditions over multiple future timesteps, autoregressive models should forcefully set those actions and conditions.

Although this setup benefits from explicitly dealing with dynamic variables without any aggregation or summarization over all time steps for a sample sequence, it would not be effectively applicable when there are high-dimensional or multi-correlated sensors, actions and conditions. Without having some complicated previous steps for variable selection, measurement noise filtering or dimensionality reduction, the model would suffer from overfitting issues, multi-collinear or correlation issues between sensor outputs and other inputs, and being too sensitive to measurement noises. In addition, when the provided historical sequence is not long enough to accurately estimate the underlying process state and predict the future sequence, there is no way to represent the model uncertainty.

*Dynamic DPDM can make predictive simulation-based controls in closed-loop situations*

One of the most important problems in enterprise AI is to make optimal sequential decisions or closed-loop controls over dynamic processes. In dynamic DPDM, we build a deep-probabilistic (DP) process model that enables the generative simulations of the likely future sensor and target sequence for any given future (or counterfactual) condition and action sequence at a latent process state. Then, we optimize our action policy for sequential decision making (DM) based on predictive simulations in model-based reinforcement learning (RL) or model-predictive control (MPC) approaches. The best action policy or controller can be designed to optimize the given KPIs, relying on the predicted experiences of sensor and target observations for different actions over the future time horizon.

## Dual-Process Model with Deep Probabilistic Latent States

How to represent the latent “state” of the dynamic process? We propose the latent process state rely on the philosophy of the dual process theory. The process state at each step is represented by an explicit probabilistic latent state (driven by Bayesian learning like probabilistic state-space models) that compress the other implicit deep deterministic hidden state (driven by deep learning like recurrent neural networks) and predictively generate the observed outputs for the provided inputs. There is also the interplay between the probabilistic model-based learning and the deep associative learning.

- The deep learning system (System 1, deterministic RNN) is much faster and powerful in extracting key patterns and features from the past sequence of high-dim inputs and observations.
- A limited memory size of past experiences (the length of the past sequence composed of past conditions, actions, sensors and targets) is typically used to represent the current process state in computational models. Also, observations are noisy and incomplete. This partial observability makes the underlying dynamic process stochastic, asking for the probabilistic state representation (System 2, probabilistic state-space model).

Deep learning would extract and represent key features from the sequence of past experiences. Dynamic processes tend to involve high-dimensional cross-correlations among action inputs, condition inputs, sensor outputs, and target outputs. Thus, it’s important that latent states can capture cross-correlations of variables. Bayesian learning would explicitly construct the dynamics of probabilistic prior and posterior states with implicitly learned features from deep learning. In addition, since the deterministic states (or hidden-layer outputs) in deep learning tend to have much higher dimensionality than the probabilistic states in Bayesian learning, probabilistic states would be very helpful for the overall model interpretability. The lower dimensional probabilistic states can be considered sequentially-learned variational auto-encoded features with the inputs of deterministic hidden states from deep learning.

Our favorite analogy for the need of deep probabilistic models combining deep learning (implicit, scalable, associative, deterministic, experience data-driven) and Bayesian probabilistic learning (interpretable, structured, prior/posterior, probabilistic, prediction model-based) is to compare them to Kahneman’s dual process of the human mind, “System 1” and “System 2” (“Thinking, Fast and Slow”). Kahneman and Tversky describe the human mind and decision making in terms of an intuitive system and deliberative system calling them “System 1” and “System 2” respectively. *System 1* is experience-based associative learning, while *System 2* is prediction-based logical and probabilistic reasoning. We might say that *System 1* and *System 2* are also similar to Judea Pearl’s the “observational, associative layer” and the “causal and counterfactual layers”, respectively.

The deep probabilistic (DP) model of the process begins with a set of actions, conditions, sensors (measurement outcomes) and targets (KPI-related outcomes). Given the past sequence of these experiences, the model is trained to learn the latent states of the process up to the current time, which basically compresses the vast amount of historical information you have into a much smaller amount, learning only the most critical pieces of information from the data. Beyond compressing the information, the model also maps it to prior and posterior probability distributions of the latent states, which is then used as part of the predictive simulation. With the current latent state probability distribution of the process, if you are given a set of actions and conditions across a future time horizon, you can accurately simulate measurement outcomes such as inventory positions or fill rate, and target outcomes such as lost sales or value at risk. In other words, you can use the DP model to get an accurate simulation of the future unknowns (measurement and target outcomes), given what you plan or know in advance (actions and conditions).

The decision machine (DM) model of the process utilizes the DP model’s simulation of the future to try out different actions and select the actions that optimize your KPIs. Iteratively over the future horizon, each time step we know or determine the condition (static or possibly changing according to previous action choices), select an action, and obtain simulated outputs. The DM controller tries out different action strategies in the simulated environment and rewards action strategies that optimize the key performance indicators associated with the desired targets. In this manner, the DM controller acts as a type of reinforcement learning, creating the flexibility to optimize the target outcome based on the cost considerations of your business. In the context of an enterprise with complex dynamic interactions and process variability, our DPDM model is unique in its ability to recommend the optimal actions in any scenario, and to continue learning based on the current actions being taken.

*DPDM = Deep Learning + Bayesian Learning + Reinforcement Learning*

In many enterprise AI problems, our decision-making agent is not allowed to learn and optimize its decision-making or control policy through multiple trials in the real world. Generating the real-world experience data from executing numerous actions and conditions is very time-consuming, expensive and even dangerous when the input ranges happen not to be properly set. For this reason, it is very desirable to build a predictive simulation model using the historical process data that companies have accumulated. Then, simulated experiences for hypothetical actions and conditions enable the agent to make optimal decisions.

In this regard, enterprise AI can be contrasted with other AI fields (e.g., robotics, games like *Go* and *Starcraft*, self-driving cars) that begin with separately-developed rich and accurate physical-world simulators or have the ability to experiment at low cost in real-world scenarios. Therefore, learning a predictive simulation model based on the historical process data available is the first task in enterprise AI applications. This is difficult to do with sufficient explainability, probabilistic richness and accuracy, given the kinds of data we encounter in enterprise AI.

We have developed a framework and platform for building these predictive simulators and model-based control/decision engines in enterprise AI contexts. The deep probabilistic (DP) dynamic model tells us how the state of the system will evolve as a result of the actions one takes based on observed sequences (conditions, actions, observations over time). Defining performance KPIs, or “targets”, one can use this model in an RL approach to construct a decision-making engine that suggests a “next best action” for any current state. The agent’s initial action policy can be learned from simulated experiences, or a model predictive control (MPC) approach can be considered using Monte-Carlo simulations at every timestep. The deployment of the controller or decision recommendations can run on the edge or in the cloud.

## The following examples illustrate some applications of dynamic DPDM.

**Predictive Quality Control in Closed-Loop Situations**

Consider an industrial dynamic process that produces products using user-controllable actions and exogenous planned conditions as inputs. For example, the process may involve actions like controlling RPM setpoints for rotating machines and temperature setpoints for heating equipment over time. Also, the process may occur in various conditions, comprising of static conditions such as different line locations, as well as other dynamic conditions like scheduled crew sequence, aimed dimensions of products changing over batches, and time info of process runs. Sensor measurement outputs related to the observed process state may include actual measured RPMs, and temperature changing over time. In addition, there may be target measurement outputs related to the process performance or KPIs such as the actual produced dimensions of products. The dynamic DPDM provides optimal actions for different conditions over time reducing the error between actual and aimed produced dimensions.

**Energy Consumption Prediction and Control**

In a similar manner, if we use an example from predictive control for energy consumption in a steel mill with rechargeable batteries, the target outputs may be total consumed energy per unit time or a defined peak usage per period (e.g., monthly peak usage of any 15-min cumulative consumed energy upon which the electricity cost is dependent), which we will optimize with controllable action inputs of battery discharging or recharging. With predictions of future peak energy consumption, batteries can be set to discharge to reduce the direct energy usage from the electricity supplier and shave off peak usage. On the other hand, predicting future low energy consumption can enable batteries to be recharged during the time without increasing electricity cost. Sensor outputs would include elapsed duration times since different events like heat start, meltdown start, load and tap. Condition inputs specific to the properties of heats (e.g., raw material, steel grade, temperature) would vary when there are transitions in heats or batched tasks.

**Asset Health or Fleet Health Maintenance**

To illustrate the predictive maintenance for assets in a plant, target outputs would be the remaining time to different types of failures of assets or components, while controllable actions would be replacement and/or inspection actions. Conditions might indicate the asset type, location, component type (often given in hierarchical categories), historical data and planned schedule of operation usage (e.g., asset and component runtimes) and external temperature (if relevant). Sensor outputs would include all key sensor measurements relevant for indicating the observed process state. Note that our deep probabilistic model simulates the probabilistic distribution of the remaining time to different types of failures for any given future conditions and actions. We optimize the risk-based maintenance action policy for a given cost objective, using process simulations.

**Demand Prediction and Supply Management**

We can use the same framework for a business dynamic process in demand forecasting and supply chain management involving different and hierarchical regional, temporal, channel, and product sales conditions. For example, we could model the inventory and demand levels for a wide variety of stock keeping units (SKUs) that belong to certain families/categories at a variety of sales/distribution centers belonging to specific regions within the global supply chain. We would want to take into account the temporal nature of these variations as well as incorporate external signals, such as market signals or internal planning information. The process would involve dynamic control actions like price and promotions influencing the demand, and safety stock levels, expedites, inventory rebalancing, and production changes influencing the supply. There would be sensor outputs related to the process state like current inventory levels, production quantities, and competitors’ sales trends over time. Some target outputs to be predicted and optimized could include value at risk, such as a composite of lost sales, inventory holding costs, expedite costs, obsolescence costs, and contractual penalties, and sales order volume over time.

**Conclusions**

In this article, we have overviewed our Deep Probabilistic Decision Machines (DPDM) approach to build a data-driven & knowledge-augmented deep-probabilistic predictive simulation model of dynamic processes and use it for decision making and control in a model-based RL/MPC framework. This generalizable solution framework for deep probabilistic decision-making AI is based on the concepts of actions, conditions, sensors and targets for modeling actionable & hierarchical systems. Dynamic latent states compress experiences and obtain key sensory features for predictable and explainable models. The generative model can be used for simulating the likely future observation sequence for any given future (or counterfactual) condition and action sequence at a process state. This also enables model-based decision making and control. Application areas include predictive quality control, energy prediction and control, asset health maintenance, production scheduling, and demand prediction and supply management in enterprise AI. DPDM applications and results in a growing number of client problems present a great opportunity of this framework. In our next articles, we will tell you concrete application examples of Static and Dynamic DPDMs.

**Citations and Extrapolations**

*Judea Pearl argued**that the field of AI got stuck in probabilistic associations, correlations, curve fitting or the “associative” level of the three layer causal hierarchy.**The causal model infers the probabilistic latent state Z of the process to represent key hidden features and capture the uncertainty due to incomplete observations.*- Thinking Fast and Slow
- The Book of Why
- The AI R&D team (Santiago Olivar; Hershel Mehta; Young Chol Song; Siva Devarakonda)