Statistical Learning in Artificial Intelligence Systems

Uncertainty is a key element of many artificial intelligence(AI) environments in the real world. By uncertainly, we refer to the characteristics that prevent an AI agent from knowing the precise outcome of a specific state-action combination in a given scenario. Uncertainly is typically the result of nondeterministic and partially observable environments. Statistical learning has become a powerful weapon to overcome uncertainty in AI scenarios and, consequently, it has been widely implemented in many modern AI frameworks.

When we talk about statistical learning, there is a name that comes to mind: Bayes. Even though most of modern statistic theory was the result of the work of French mathematician Pierre-Simon de Laplace who lived five decades after Bayes, it is Bayes who got all the credit in a theorem that bears his name. Thomas Bayes was an eighteen century British clergyman that described new insights to think about chance and uncertainty. Laplace codified Bayes ideas into a single theorem that helps us reason about almost anything in the world with an once of uncertainty:

P(cause|effect) = P(effect)xP(effect|cause)/P(cause).

By P(A|B) we denote the probability and A occurs given B. Replacing cause and effect in the previous equation with the probabilities of any state-action combination in an AI environment we arrive to the fundamentals of Bayesian learning. Essentially, Bayesian or statistical learning focuses on calculating the probabilities of each hypothesis and make predictions accordingly.

Although Bayesian learning seems theoretically trivial, it runs into many roadblocks in real world AI solutions. Specifically, Bayesian learning models frequently result impractical in environments in which the number of hypothesis is very large or infinite. A very well-known AI algorithm that tries to address those limitations is the maximum a posteriori(MAP) model that simply makes predictions based on the single most probable hypothesis,

Maybe the most notorious algorithm in statistical learning is the Naive Bayes model( also referred to as the Bayesian classifier) which uses networks to model environments in which the effects are independent given the cause. The model is “naive” precisely because it assumes that attributes are independent of each other given the class.

AI models like Native Bayes are only applicable in fully-observable environments. Many AI which don’t resemble many real world AI environments. For instance, many AI environments contain hidden variables that are not available in the training data set. Let’s take an example from the health care world, electronical medical records typically include observations about the symptoms is a disease rather than about the disease itself. In those scenarios algorithms such as unsupervised clustering using Mixture of Gaussians, Learning Bayesian Networks and Learning Hidden Markov Models are typically a good choice.

As it turns out, statistical learning is not a solution to every AI problem. Laplace’s original thought that any form of human knowledge can be codified in a statistical network failed to account for many aspects of human reasoning. A very well known limitation of statical learning models is the absence of logic which results key in many forms of knowledge. As a result, statistical learning techniques are not applicable in many AI scenarios in the real world. That will be the subject of the second part of this essay.