Entropy & Conditional Entropy

Yigit Can Turk
Analytics Vidhya
Published in
4 min readNov 10, 2021

One of the term that is commonly used in statistics related fields is entropy. It has an important role on many data driven areas like data mining, machine learning and natural language processing (or NLP shortly). For example, it is used for node splitting in decision tree algorithm. You can have a idea about the relation of two or more features in a data set. So you extract an information of dependency between these features by given formula of the entropy.

What is entropy?

Entropy is a measure of the randomness of a variable in a sentence, data set etc. It is possible to have an idea of if a variable exists for a state S by using this metric. Since it declares the randomness, whenever a variable’s possibility of existence gets close to exact state its entropy becomes zero. In this situation, it is definite that the variable exists and no randomness occur after that time. As opposite of that we can say that entropy increases and gets close to 1 (entropy value must be between 0 and 1) whenever we say that ‘this variable can exist or not and it is hard to say if it exists or not’.

Entropy is 0 if variable exists definitely and 1 if it may exist with probability of 0.5 and not exists with same probability. It is easy to explain this on the formula. Log1 is 0 in math. So exact existence makes the result 0.

Let’s assume that we toss a coin (that is the most common example of any statistical topic). If this coin is fair (a normal coin), so we know that probability of having head or tail is same and equals to 0.5 for a fair coin. So in that case, the equality of existing and non-existing of any face on the coin results entropy with 1.

H(X) = — log_2 (0.5) * 0.5 + — log_2(0.5) * 0.5

H(X) = — (-1) (0.5) — (-1)(0.5) =1

On the other hand, the exact existence of the variable makes the entropy value 0.

H(X) = — log_2 (1) * 1 + — log_2(0) * 0

H(X) = — (0) * 1 + 0 = 0

So we can show the entropy in a graph.

Entropy on a graph

Conditional Entropy

Conditional entropy does not much differ from entropy itself. If you are familiar with probability lectures, you must know the conditional probability. It gives the probability of a variable A in the case where the state of another variable B is known. For example, if variable B does exist. So probability of existence of variable A is shown as P(A=?|B=1). This is called conditional entropy and a key topic if you are interested in any statistical field no matter what.

Formula of conditional probability of variable A on B

We can combine entropy metric that is described above and the conditional probability in order to calculate a new value that is called conditional entropy. So that metric includes the relations of two variables into the entropy calculation.

Let’s give an example of usage of conditional entropy. There is a relation between the words Water and Drink. Can you refuse it? No. Because we know that ‘people drink water’. So it is possible to Word water if the word Drink is observed before. We can say that existence of word Drink affects the probability of existence of word Water. So its probability is no longer same with the probability when word Drink was not observed. The word Drink has effect on the randomness of the word Water. This is called conditional entropy.

We can also make new comments on the example above. We all know that if there is word Drink in a sentence, the probability of word Water so high. Because we all have to drink water but no coke or soda. The possibility of drinking water is much higher than possibility of drinking soda. So we can say that conditional entropy of word Water on word Drink is almost zero or very close number to zero (almost definite).

Conditional entropy of word meat on word eats

Where is entropy used?

Some fields where entropy is used are

  • Machine learning algorithms (e.g. Decision tree algorithm)
  • Natural language processing (NLP)
  • Feature engineering
  • Regularization in reinforcement learning

--

--