Do you want to learn AI, learn Cross-Entropy First?

Mahesh Pardeshi
Jul 18, 2019 · 4 min read
Photo by Markus Spiske on Unsplash

Whenever you will work on any AI or Neural Network project finding the loss in your model is the basic step.

Let’s say you are solving any classification problem like the given text has a positive or negative sentiment. Now in any of the neural network problems, we will use some random weights to train the model in the first round. After the first round is completed we check the model’s prediction against targeted values and as expected it will not give the correct predictions in the first round Right?

Now to get the correct predictions, we need to update the random weights which we have used in the previous round. NOw the question is how to update these weights to get better predictions. The answer is by changing the weights in the right direction.

So how we will change these weights and answer is by finding the loss in the previous result.

There are multiple loss functions to check how much you are incorrect comparing the actual result.

We will discuss one of the common loss function in this blog i.e. Cross-Entropy Loss Function.


Suppose we are predicting who will win the 2019 world cup and 2 main options are Englan and India. Suppose you want to share this prediction with your friend overseas by sending a minimum amount of data. so instead of sending the text, you will prefer to send the 1 bit of information like 0 or 1 for that specific country who will win.

If its 0 for India then India will lose the match and England will win.

In this case, we have to send only 1 bit of information and we can find this number of bits using log function like below.

bits=log(Number Of Predictions)

Now suppose we have to send the prediction in % like England can win 75% and India can win 25%. Now to find how much data we need to send is like below.

bits=log(75%)

bits=log(0.75)

bits=0.31

and for the remaining 25%, it will be

log(.25)=2

To calculate the entropy there is a formula

E=-(0.75*(-0.31)+(0.25*(-2))

E=0.81

So Entropy is 0.81 which is high. If we have the data 50% Positive and 50% Negative then Entropy will be 1 and If we have data with 100% positive and 0% negative then entropy will be 0. In the data science project, we require both types of data as + and — for better training purposes.

But we can use the same Entropy concept to find the loss or error in the function called Cross-Entropy.

But the Entropy is the just value calculated on just assumptions of the estimated probability, not the real probability. In Cross-Entropy, we are cross-checking the Entropy values with the real values of the Probability distribution.

Now if our Prediction is exactly equal to the real probability then Cross-Entropy will be equal to the Entropy. but if they are not the same then Cross-Entropy will be greater then the Entropy.

Take one example like there are 4 images Cat, Dog, Horse, and fox.

Now we have one hot encoding for each of them like,

Cat= 1 0 0 0

Dog= 0 1 0 0

Horse= 0 0 1 0

Fox= 0 0 0 1

Now if we see here that first encoding is told that it is 100% cat no doubt and second encoding telling that it is 100% Dog. So there is no confusion.

So Entropy for a cat will be P1=1 0 0 0 as data is giving 100% prediction that it is Cat.

But suppose if we pass this data to machine and machine predicts that it is a Dog like Q1=[0.4 0.3 0.1 0.1] it means 40% cat and 30% Dog and 10% Horse and 10% Fox, it means prediction is not clear to one category and it is distributed in all the categories.

In this case, Cross-Entropy will be help us to find the loss

H(Q1,P1)= — Sum(P1i * log(Q1i)

H(Q1,P1)= — (1*log(0.4)+0*log(0.3)+0*log(0.1+0*log(0.1))

H(Q1,P1)=1.3

Which is very very high. We have to get the Cross Entropy Close to 0.

So we are expecting the Cross-Entropy to 0 and it predicts the CrossEntropy to 1.3, so what is this, isn’t it Loss in the Function.

Yes, it is a loss. The cross-entropy goes down as the prediction gets more and more accurate.

So our task is to reduce the loss or Error in the function by getting more better results.

And the current results are based on random weights. Now we have to change the weights so that it will give us better prediction and cross-entropy will be less.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade