# Introducing TensorFlow Probability

## Posted by: Josh Dillon, Software Engineer; Mike Shwe, Product Manager; and Dustin Tran, Research Scientist — on behalf of the TensorFlow Probability Team

At the 2018 TensorFlow Developer Summit, we announced TensorFlow Probability: a probabilistic programming toolbox for machine learning researchers and practitioners to quickly and reliably build sophisticated models that leverage state-of-the-art hardware. You should use TensorFlow Probability if:

• You want to build a generative model of data, reasoning about its hidden processes.
• You need to quantify the uncertainty in your predictions, as opposed to predicting a single value.
• Your training set has a large number of features relative to the number of data points.
• Your data is structured — for example, with groups, space, graphs, or language semantics — and you’d like to capture this structure with prior information.
• You have an inverse problemsee this TFDS’18 talk for reconstructing fusion plasmas from measurements.

TensorFlow Probability gives you the tools to solve these problems. In addition, it inherits the strengths of TensorFlow such as automatic differentiation and the ability to scale performance across a variety of platforms: CPUs, GPUs, and TPUs.

# What’s in TensorFlow Probability? An overview of TensorFlow Probability. The probabilistic programming toolbox provides benefits for users ranging from Data Scientists and Statisticians to all TensorFlow Users.

Layer 0: TensorFlow. Numerical operations. In particular, the LinearOperator class enables matrix-free implementations that can exploit special structure (diagonal, low-rank, etc.) for efficient computation. It is built and maintained by the TensorFlow Probability team and is now part of `tf.linalg` in core TF.

Layer 1: Statistical Building Blocks

• Distributions (`tf.contrib.distributions`, `tf.distributions`): A large collection of probability distributions and related statistics with batch and broadcasting semantics.
• Bijectors (`tf.contrib.distributions.bijectors`): Reversible and composable transformations of random variables. Bijectors provide a rich class of transformed distributions, from classical examples like the log-normal distribution to sophisticated deep learning models such as masked autoregressive flows.

Layer 2: Model Building

• Edward2 (`tfp.edward2`): A probabilistic programming language for specifying flexible probabilistic models as programs.
• Probabilistic Layers (`tfp.layers`): Neural network layers with uncertainty over the functions they represent, extending TensorFlow Layers.
• Trainable Distributions (`tfp.trainable_distributions`): Probability distributions parameterized by a single Tensor, making it easy to build neural nets that output probability distributions.

Layer 3: Probabilistic Inference

• Markov chain Monte Carlo (`tfp.mcmc`): Algorithms for approximating integrals via sampling. Includes Hamiltonian Monte Carlo, random-walk Metropolis-Hastings, and the ability to build custom transition kernels.
• Variational Inference (`tfp.vi`): Algorithms for approximating integrals via optimization.
• Optimizers (`tfp.optimizer`): Stochastic optimization methods, extending TensorFlow Optimizers. Includes Stochastic Gradient Langevin Dynamics.
• Monte Carlo (`tfp.monte_carlo`): Tools for computing Monte Carlo expectations.

• Bayesian structural time series (coming soon): High-level interface for fitting time-series models (i.e., similar to R’s BSTS package).
• Generalized Linear Mixed Models (coming soon): High-level interface for fitting mixed-effects regression models (i.e., similar to R’s lme4 package).

The TensorFlow Probability team is committed to supporting users and contributors with cutting-edge features, continuous code updates, and bug fixes. We’ll continue to add end-to-end examples and tutorials.

# Let’s see some examples!

## Linear Mixed Effects Models with Edward2

As demonstration, consider the InstEval data set from the popular lme4 package in R, which consists of university courses and their evaluation ratings. Using TensorFlow Probability, we specify the model as an Edward2 probabilistic program (`tfp.edward2`), which extends Edward. The program below reifies the model in terms of its generative process.

`import tensorflow as tffrom tensorflow_probability import edward2 as eddef model(features):  # Set up fixed effects and other parameters.  intercept = tf.get_variable("intercept", [])  service_effects = tf.get_variable("service_effects", [])  student_stddev_unconstrained = tf.get_variable(      "student_stddev_pre", [])  instructor_stddev_unconstrained = tf.get_variable(      "instructor_stddev_pre", [])  # Set up random effects.  student_effects = ed.MultivariateNormalDiag(      loc=tf.zeros(num_students),      scale_identity_multiplier=tf.exp(          student_stddev_unconstrained),      name="student_effects")  instructor_effects = ed.MultivariateNormalDiag(      loc=tf.zeros(num_instructors),      scale_identity_multiplier=tf.exp(          instructor_stddev_unconstrained),      name="instructor_effects")  # Set up likelihood given fixed and random effects.  ratings = ed.Normal(      loc=(service_effects * features["service"] +           tf.gather(student_effects, features["students"]) +           tf.gather(instructor_effects, features["instructors"]) +           intercept),      scale=1.,      name="ratings")return ratings`

The model takes as input a features dictionary of “service”, “students”, and “instructors”; they are vectors where each element describes an individual course. The model regresses on these inputs, posits latent random variables, and returns a distribution over the courses’ evaluation ratings. TensorFlow session runs on this output will return a generation of the ratings.

Check out the ”Linear Mixed Effects Models” tutorial for details on how we train the model using the tfp.mcmc.HamiltonianMonteCarlo algorithm, and how we explore and interpret the model using posterior predictions.

## Gaussian Copulas with TFP Bijectors

`import tensorflow_probability as tfptfd = tfp.distributionstfb = tfp.distributions.bijectors# Example: Log-Normal Distributionlog_normal = tfd.TransformedDistribution(    distribution=tfd.Normal(loc=0., scale=1.),    bijector=tfb.Exp())# Example: Kumaraswamy DistributionKumaraswamy = tfd.TransformedDistribution(    distribution=tfd.Uniform(low=0., high=1.),    bijector=tfb.Kumaraswamy(        concentration1=2.,        concentration0=2.))# Example: Masked Autoregressive Flow# https://arxiv.org/abs/1705.07057shift_and_log_scale_fn = tfb.masked_autoregressive_default_template(    hidden_layers=[512, 512],    event_shape=[28*28])maf = tfd.TransformedDistribution(    distribution=tfd.Normal(loc=0., scale=1.),         bijector=tfb.MaskedAutoregressiveFlow(        shift_and_log_scale_fn=shift_and_log_scale_fn))`

The “Gaussian Copula” creates a few custom Bijectors and then shows how to easily build several different copulas. For more background on distributions, see “Understanding TensorFlow Distributions Shapes.” It describes how to manage shapes for sampling, batch training, and modeling events.

## Variational Autoencoder with TFP Utilities

`import tensorflow as tfimport tensorflow_probability as tfp# Assumes user supplies `likelihood`, `prior`, `surrogate_posterior`# functions and that each returns a # tf.distribution.Distribution-like object.elbo_loss = tfp.vi.monte_carlo_csiszar_f_divergence(    f=tfp.vi.kl_reverse,  # Equivalent to "Evidence Lower BOund"    p_log_prob=lambda z: likelihood(z).log_prob(x) + prior().log_prob(z),    q=surrogate_posterior(x),    num_draws=1)train = tf.train.AdamOptimizer(    learning_rate=0.01).minimize(elbo_loss)`

To see more details, check out our variational autoencoder example!

## Bayesian Neural Networks with TFP Probabilistic Layers

As demonstration, consider the CIFAR-10 dataset which has features (images of shape 32 x 32 x 3) and labels (values from 0 to 9). To fit the neural network, we’ll use variational inference, which is a suite of methods to approximate the neural network’s posterior distribution over weights and biases. Namely, we use the recently published Flipout estimator in the TensorFlow Probabilistic Layers module (`tfp.layers`).

`import tensorflow as tfimport tensorflow_probability as tfpmodel = tf.keras.Sequential([    tf.keras.layers.Reshape([32, 32, 3]),    tfp.layers.Convolution2DFlipout(        64, kernel_size=5, padding='SAME', activation=tf.nn.relu),    tf.keras.layers.MaxPooling2D(pool_size=[2, 2],                                 strides=[2, 2],                                 padding='SAME'),    tf.keras.layers.Reshape([16 * 16 * 64]),    tfp.layers.DenseFlipout(10)])logits = model(features)neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(    labels=labels, logits=logits)kl = sum(model.get_losses_for(inputs=None))loss = neg_log_likelihood + kltrain_op = tf.train.AdamOptimizer().minimize(loss)`

The model object composes neural net layers on an input tensor, and it performs stochastic forward passes with respect to probabilistic convolutional layer and probabilistic densely-connected layer. The function returns an output tensor with shape given by the batch size and 10 values. Each row of this tensor represents the logits (unconstrained probability values) that each data point belongs to one of the 10 classes.

For training, we build the loss function, which comprises two terms: the expected negative log-likelihood and the KL divergence. We approximate the expected negative log-likelihood via Monte carlo. The KL divergence is added via regularizer terms which are arguments to the layers.

`tfp.layers` can also be used with eager execution using the tf.keras.Model class.

`class MNISTModel(tf.keras.Model):  def __init__(self):    super(MNISTModel, self).__init__()    self.dense1 = tfp.layers.DenseFlipout(units=10)    self.dense2 = tfp.layers.DenseFlipout(units=10)  def call(self, input):    """Run the model."""    result = self.dense1(input)    result = self.dense2(result)    # reuse variables from dense2 layer    result = self.dense2(result)      return resultmodel = MNISTModel()`

# Getting started

`pip install --user --upgrade tfp-nightly`

For all the code and details, check out github.com/tensorflow/probability. We’re excited to collaborate with you via GitHub, whether you’re a user or contributor!

## TensorFlow

TensorFlow is an end-to-end open source platform for…

## TensorFlow

TensorFlow is an end-to-end open source platform for machine learning.

Written by

## TensorFlow

TensorFlow is a fast, flexible, and scalable open-source machine learning library for research and production. ## TensorFlow

TensorFlow is an end-to-end open source platform for machine learning.