# The TensorFlow Way(Part 2)

## Loss Functions

Oct 7 · 5 min read

# Implementing Loss Functions

Loss functions are very important to machine learning algorithms. They measure the distance between the model outputs and the target (truth) values.

In order to optimize our machine learning algorithms, we will need to evaluate the outcomes. Evaluating outcomes in TensorFlow depends on specifying a loss function. A loss function tells TensorFlow how good or bad the predictions are compared to the desired result. In most cases, we will have a set of data and a target on which to train our algorithm. The loss function compares the target to the prediction and gives a numerical distance between the two.

How to do it…

## Loss functions for regression

That is, predicting a continuous dependent variable. To start, we will create a sequence of our predictions and a target as a tensor. We will output the results across 500 x-values between -1 and 1.

`import matplotlib.pyplot as pltimport tensorflow as tfx_vals = tf.linspace(-1., 1., 500)target = tf.constant(0.)`

## L2 norm loss

The L2 norm loss is also known as the Euclidean loss function. It is just the square of the distance to the target. Here we will compute the loss function as if the target is zero. The L2 norm is a great loss function because it is very curved near the target and algorithms can use this fact to converge to the target more slowly, the
closer it gets.

`l2_y_vals = tf.square(target - x_vals)l2_y_out = sess.run(l2_y_vals)TensorFlow has a built -in form of the L2 norm, called nn.l2_loss().This function is actually half the L2-norm above. In other words, it is same as previously but divided by 2.`

## L1 norm loss

The L1 norm loss is also known as the absolute loss function. Instead of squaring the difference, we take the absolute value. The L1 norm is better for outliers than the L2 norm because it is not as steep for larger values. One issue to be aware of is that the L1 norm is not smooth at the target and this can result in algorithms not converging well. It appears as follows:

`l1_y_vals = tf.abs(target — x_vals)l1_y_out = sess.run(l1_y_vals)`

## Pseudo-Huber

Pseudo-Huber loss is a continuous and smooth approximation to the Huber loss function. This loss function attempts to take the best of the L1 and L2 norms by being convex near the target and less steep for extreme values. The form depends on an extra parameter, delta, which dictates how steep it will be. We will plot two forms, delta1 = 0.25 and delta2 = 5 to show the difference,

`delta1 = tf.constant(0.25)phuber1_y_vals = tf.mul(tf.square(delta1), tf.sqrt(1. + tf.square((target - x_vals)/delta1)) - 1.)phuber1_y_out = sess.run(phuber1_y_vals)delta2 = tf.constant(5.)phuber2_y_vals = tf.mul(tf.square(delta2), tf.sqrt(1. +tf.square((target - x_vals)/delta2)) - 1.)phuber2_y_out = sess.run(phuber2_y_vals)`

## Classification loss

Classification loss functions are used to evaluate loss when predicting categorical outcomes.

redefine our predictions ( x_vals ) and target. We will save the outputs and plot them in the next section. Use the following:

`x_vals = tf.linspace(-3., 5., 500)target = tf.constant(1.)targets = tf.fill([500,], 1.)`

## Hinge loss

Hinge loss is mostly used for support vector machines but can be used in neural networks as well. It is meant to compute a loss between two target classes, 1 and -1. In the following code, we are using the target value 1, so the as closer our predictions as near are to 1, the lower the loss value:

`hinge_y_vals = tf.maximum(0., 1. — tf.mul(target, x_vals))hinge_y_out = sess.run(hinge_y_vals)`

## Cross-entropy loss

Cross-entropy loss for a binary case is also sometimes referred to as the logistic loss function. It comes about when we are predicting the two classes 0 or 1. We wish to measure a distance from the actual class ( 0 or 1 ) to the predicted value, which is usually a real number between 0 and 1. To measure this distance, we can use the cross-entropy formula from information theory, as follows:

`xentropy_y_vals = - tf.mul(target, tf.log(x_vals)) - tf.mul((1. -target), tf.log(1. - x_vals))xentropy_y_out = sess.run(xentropy_y_vals)`

## Sigmoid cross-entropy loss

Sigmoid cross-entropy loss is very similar to the previous loss function
except we transform the x-values by the sigmoid function before we put them in the cross-entropy loss, as follows:

`xentropy_sigmoid_y_vals = tf.nn.sigmoid_cross_entropy_with_logits(x_vals, targets)xentropy_sigmoid_y_out = sess.run(xentropy_sigmoid_y_vals)`

## Weighted cross-entropy loss

Weighted cross-entropy loss is a weighted version of the sigmoid cross-entropy loss. We provide weight on the positive target. For example, we will weight the positive target by 0.5, as follows

`weight = tf.constant(0.5)xentropy_weighted_y_vals = tf.nn.weighted_cross_entropy_with_logits(x_vals, targets, weight)xentropy_weighted_y_out = sess.run(xentropy_weighted_y_vals)`

## Softmax cross-entropy loss

Softmax cross-entropy loss operates on non-normalized outputs. This function is used to measure a loss when there is only one target category instead of multiple. Because of this, the function transforms the outputs into a probability distribution via the softmax function and then computes the loss function from a true probability distribution, as follows

`unscaled_logits = tf.constant([[1., -3., 10.]])target_dist = tf.constant([[0.1, 0.02, 0.88]])softmax_xentropy = tf.nn.softmax_cross_entropy_with_logits(unscaled_logits, target_dist)print(sess.run(softmax_xentropy))[ 1.16012561]`

## Sparse softmax cross-entropy loss

Sparse softmax cross-entropy loss is the same as previously, except instead
of the target being a probability distribution, it is an index of which category is true. Instead of a sparse all-zero target vector with one value of one, we just pass in the index of which category is the true value, as follows:

`unscaled_logits = tf.constant([[1., -3., 10.]])sparse_target_dist = tf.constant([2])sparse_xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(unscaled_logits, sparse_target_dist)print(sess.run(sparse_xentropy))[ 0.00012564]`

# How it works…

Use matplotlib to plot the regression loss functions:

`x_array = sess.run(x_vals)plt.plot(x_array, l2_y_out, 'b-', label='L2 Loss')plt.plot(x_array, l1_y_out, 'r--', label='L1 Loss')plt.plot(x_array, phuber1_y_out, 'k-.', label='P-Huber Loss (0.25)')plt.plot(x_array, phuber2_y_out, 'g:', label='P'-Huber Loss (5.0)')plt.ylim(-0.2, 0.4)plt.legend(loc='lower right', prop={'size': 11})plt.show()`

Use matplotlib to plot the various classification loss functions:

`x_array = sess.run(x_vals)plt.plot(x_array, hinge_y_out, 'b-', label='Hinge Loss')plt.plot(x_array, xentropy_y_out, 'r--', label='Cross Entropy Loss')plt.plot(x_array, xentropy_sigmoid_y_out, 'k-.', label='Cross EntropySigmoid Loss')plt.plot(x_array, xentropy_weighted_y_out, g:', label='Weighted CrossEnropy Loss (x0.5)')plt.ylim(-1.5, 3)plt.legend(loc='lower right', prop={'size': 11})plt.show()`

# Summary

Table summarizing the different loss functions that we have described

Most of the classification loss functions described here are for two-class two-class predictions. This can be extended to multiple classes via summing the cross-entropy terms over each prediction/target.

Written by

Written by