Modelling Aleotoric & Epistemic Uncertainty In Tensorflow

Santhosh
The Startup
Published in
5 min readApr 16, 2020

Intelligence is not to act when you are uncertain

There can be two types of errors associated with any kind of Machine learning or deep learning model

  1. Reducible Error(With more data points this error can be reduced)
  2. Irreducible Error(This error is because of inherent variance in the data)

The overall error made by any model is a combination of the above two errors.

Aleatoric Uncertainty:

When we have done any lab experiment, the values measured after multiple trials will never be the same. Even with all similar input values output measurements will differ every time you run. This is what we call Aleatoric uncertainty. Uncertainty about Y values caused by small noise in input data. Aleatoric uncertainty is nothing but Irreducible error because nothing can be done to change this real-world process.

By default data of any process in the real world will have Aleotoric uncertainty.

Above figure, every x value has multiple Y values associated with it.

Epistemic Uncertainty:

Epistemic uncertainty is our ignorance about the correct model that generated the data This includes the uncertainty in the model, parameters, convergence

Epistemic uncertainty arises when we have a limited understanding of the real-world process for which we are building the model. This happens due to a lack of domain knowledge, and we are not able to capture all the input variables that affect the target variable.

This is a reducible error, with more knowledge about the process we can do better.

If we have more information, we can take more measurements, conduct more tests, “buy” more information. This type of uncertainty can be reduced.

Why Uncertainty is important?

When you look at the final layer of any image classification problem, for any given random image one neuron will have a maximum value, But it will never tell you how confident it is about that prediction.

In every real-world problem what we want is not just the predictions but also confidence or certainty about that prediction.

Exmaples

When you are building a self-driving car, and when you predict a person walking on the road, you need confidence associated with prediction to act upon it properly.

Similarly, in finance, when we built a stock trading bot, we don't want the bot to make trades when it's not confident and lead us to bankruptcy.

Many applications of machine learning depend on the good estimation of the uncertainty:

● Forecasting

● Decision making

● Learning from limited, noisy, and missing data

● Learning complex personalized models

● Data compression

● Automating scientific modeling, discovery, and experiment design

Bayesian Neural Networks

The Bayesian approach offers an intuitive framework for representing beliefs and updating those beliefs based on new data.

Bayesian Neural networks are similar to normal neural nets except that every parameter in each layer will have a probability distribution with a mean and variance. This way we are doubling the number of parameters we are learning.

In the Bayesian world of modeling, everything has a probability distribution associated with it, including weights and bias.

Every time you run the Bayesian model it samples the weights and biases from learned distributions (Thus we get multiple models). Like multiple linear regression lines as shown below.

Tensorflow Probability

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow.This library helps us to apply probabilistic programming to many real-world applications.

https://medium.com/tensorflow/an-introduction-to-probabilistic-programming-now-available-in-tensorflow-probability-6dcc003ca29e

Uncertainty Modeling Using Bayesian Methods

We will use Tensorflow probability to model Aleotoric and Epistemic uncertainty using Bayesian Methods.

Will be looking at a regression problem for this

Import and create place holders

Create the model

The dense flip out layer tries to find a posterior distribution as close to prior distribution as possible my using kl-divergence loss.

while modeling Aleotroic uncertainty we always try to model variance distribution using a Gamma distribution or inverse gamma distribution, as Normal distribution has a chance of getting negative values(But standard deviation cannot be negative)

https://probflow.readthedocs.io/en/latest/ug_parameters.html#scale-parameter

Calculate the loss:

prediction_distribution is a Normal distribution. prediction_distribution.log_prob(original_y_value) gives us the probability density of a particular point. Intuitively if original_y_value lies in the prediction distribution density will have a high value and neg_log_likelihood will have a lower value (that is a lower loss). If the original_y_value doesn't lie in the predicted distribution we will have high loss

Kl divergence measures how close two probability distributions are, in the above code layer.losses will give the kl-divergence loss between kernel prior and kernel posterior function of that layer.

https://probflow.readthedocs.io/en/latest/ug_math.html

Train the model

I have used randomly generated data here

Sample from learned prediction distribution

As you can see we are now able to see how confident the model is when it made a prediction on test data.

--

--