Reasonable Doubt: Get Onto the Top 35 MNIST Leaderboard by Quantifying Aleatoric Uncertainty

By Bayan Bruss, Jason Wittenbach, and James Montgomery, Capital One

Capital One Tech
Dec 6, 2018 · 10 min read
Image for post
Image for post

Defining Uncertainty

Image for post
Image for post
Figure 1: (a) A data generating distribution that will produce data with heteroscedastic uncertainty. Data is generated by first sampling an x-value from a uniform distribution and then drawing a y-value from p(y|x) which is a Gaussian where both the mean (solid line) and variance (shaded region represents two standard deviations) vary as a function of x. (b) Data generated from the process in (a). © Fitting a neural network to this data shows good overall prediction but moderate overfitting, especially in regions of high uncertainty. (d) Adding weight decay and tuning its strength prevents overfitting, but requires a search over decay strengths. (e) Fitting a neural network with an estimate of aleatoric uncertainty provides confidence estimates on predictions and prevents a large amount of the overfitting seen with the traditional neural network (red shaded region represents model prediction of two standard deviations). (f) Adding weight decay to the model and tuning its strength is still helpful to smooth out the uncertainty prediction.
Image for post
Image for post
Figure 2 : (a) A high-level depiction of the computational graph for a standard neural network. The input (X) is transformed through multiple hidden layers (Hk), and the final layer produces an estimate (Y) of the target (Y). The loss (L), is then computed as the mean-squared error between Y and Y. (b) To model aleatoric uncertainty, we split the neural network into two streams, the first of which produces an estimate of a mean () while the second produces an estimate of a standard deviation (). Assuming that these parameterize a Gaussian distribution (N(,)), we can then define the loss as the negative log likelihood of the target under this distribution. © For a classification problem with k classes, the standard approach is to have the network output k weights (w) that represent the evidence for each class. These weights are then passed through a softmax function to produce probabilities for each class (p) and the loss is the cross-entropy between the probability vector and the one-hot target. We can model aleatoric uncertainty in this setting by having our network produce the parameters for a k-dimensional Guassian distribution over the evidence weights (instead of the outputs). We then minimize the average cross-entropy loss over samples from this distribution.

Modeling Uncertainty in Deep Neural Networks

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

MNIST Example

Image for post
Image for post
Figure 3: Sample code for modifying traditional MNIST convolutional neural network to predict aleatoric uncertainty.
Image for post
Image for post
Figure 4: Network architecture of Bayesian reparameterized MNIST CNN
Image for post
Image for post
Image for post
Image for post
Figure 5: Distributions over softmax inputs for two example digits (eight and three). Each row contains the predicted gaussian distribution over that digit. Values further to the right indicate stronger evidence for that digit. Units are not included as only relative distances matter prior to the softmax transformation.
Image for post
Image for post
Figure 6: Distributions over softmax inputs for two example digits (nine and six). Only the relevant subset of softmax inputs are shown here to highlight overlap.

Conclusion

More reading on the topic


Capital One Tech

The low down on our high tech from the engineering experts…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store