The TensorFlow Way(Part 2)

Loss Functions

Bhanu Soni
Oct 7 · 5 min read

Implementing Loss Functions

Loss functions are very important to machine learning algorithms. They measure the distance between the model outputs and the target (truth) values.

Getting ready…

In order to optimize our machine learning algorithms, we will need to evaluate the outcomes. Evaluating outcomes in TensorFlow depends on specifying a loss function. A loss function tells TensorFlow how good or bad the predictions are compared to the desired result. In most cases, we will have a set of data and a target on which to train our algorithm. The loss function compares the target to the prediction and gives a numerical distance between the two.

How to do it…

Loss functions for regression

That is, predicting a continuous dependent variable. To start, we will create a sequence of our predictions and a target as a tensor. We will output the results across 500 x-values between -1 and 1.

L2 norm loss

The L2 norm loss is also known as the Euclidean loss function. It is just the square of the distance to the target. Here we will compute the loss function as if the target is zero. The L2 norm is a great loss function because it is very curved near the target and algorithms can use this fact to converge to the target more slowly, the
closer it gets.

TensorFlow has a built -in form of the L2 norm, called nn.l2_loss().
This function is actually half the L2-norm above. In other words, it is same as previously but divided by 2.

L1 norm loss

The L1 norm loss is also known as the absolute loss function. Instead of squaring the difference, we take the absolute value. The L1 norm is better for outliers than the L2 norm because it is not as steep for larger values. One issue to be aware of is that the L1 norm is not smooth at the target and this can result in algorithms not converging well. It appears as follows:


Pseudo-Huber loss is a continuous and smooth approximation to the Huber loss function. This loss function attempts to take the best of the L1 and L2 norms by being convex near the target and less steep for extreme values. The form depends on an extra parameter, delta, which dictates how steep it will be. We will plot two forms, delta1 = 0.25 and delta2 = 5 to show the difference,

phuber1_y_vals = tf.mul(tf.square(delta1), tf.sqrt(1. + tf.square((target - x_vals)/delta1)) - 1.)phuber1_y_out = = tf.constant(5.)phuber2_y_vals = tf.mul(tf.square(delta2), tf.sqrt(1. +
tf.square((target - x_vals)/delta2)) - 1.)
phuber2_y_out =

Classification loss

Classification loss functions are used to evaluate loss when predicting categorical outcomes.

redefine our predictions ( x_vals ) and target. We will save the outputs and plot them in the next section. Use the following:

Hinge loss

Hinge loss is mostly used for support vector machines but can be used in neural networks as well. It is meant to compute a loss between two target classes, 1 and -1. In the following code, we are using the target value 1, so the as closer our predictions as near are to 1, the lower the loss value:

Cross-entropy loss

Cross-entropy loss for a binary case is also sometimes referred to as the logistic loss function. It comes about when we are predicting the two classes 0 or 1. We wish to measure a distance from the actual class ( 0 or 1 ) to the predicted value, which is usually a real number between 0 and 1. To measure this distance, we can use the cross-entropy formula from information theory, as follows:

Sigmoid cross-entropy loss

Sigmoid cross-entropy loss is very similar to the previous loss function
except we transform the x-values by the sigmoid function before we put them in the cross-entropy loss, as follows:

Weighted cross-entropy loss

Weighted cross-entropy loss is a weighted version of the sigmoid cross-entropy loss. We provide weight on the positive target. For example, we will weight the positive target by 0.5, as follows

Softmax cross-entropy loss

Softmax cross-entropy loss operates on non-normalized outputs. This function is used to measure a loss when there is only one target category instead of multiple. Because of this, the function transforms the outputs into a probability distribution via the softmax function and then computes the loss function from a true probability distribution, as follows

Sparse softmax cross-entropy loss

Sparse softmax cross-entropy loss is the same as previously, except instead
of the target being a probability distribution, it is an index of which category is true. Instead of a sparse all-zero target vector with one value of one, we just pass in the index of which category is the true value, as follows:

How it works…

Use matplotlib to plot the regression loss functions:

Image for post
Image for post
Plotting various regression loss functions.

Use matplotlib to plot the various classification loss functions:

Image for post
Image for post
Plots of classification loss functions.


Table summarizing the different loss functions that we have described

Image for post
Image for post

Most of the classification loss functions described here are for two-class two-class predictions. This can be extended to multiple classes via summing the cross-entropy terms over each prediction/target.

AI In Plain English

Go deeper with Artificial Intelligence, Machine Learning, Data Science, and Big Data.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store