# Problem Setup

Suppose there is an unknown target function or “true function” f(x) that maps input vector X to output Y. For instance, f(X) could be the function that takes features of the house (#no of the bedroom, distance to the nearest hospital, etc.)and maps it to its corresponding price Y.

For any given input X there might not exist a unique label Y. we could imagine two houses with identical features but with a different price. This randomness could be due to various factors and that affects Y but we haven't taken into account with our function f. The relationship between the seller and owner can be one of such factors. …

# Entropy

KL divergence has its origin in information theory. But before understanding it we need to understand another important metric in information theory called Entropy. Since this article is not about entropy I will not cover it in depth here. I wrote about it in detail here.

The main goal of information theory is to quantify how much information is in the data. Those events that are rare (low probability) are more surprising and therefore have more information than those events that are common (high probability). For example, Would you be surprised if I told you the coin with head on both sides gave head as an outcome? No, because outcomes did not give you any further information. …

# Thinking in terms of Bits

Imagine you want to send outcomes of 3 coin flips to your friend’s house. Your friend knows that you want to send him those messages but all he can do is get the answer of Yes/No questions arranged by him. Let’s assume the arranged question: Is it head? You will send him the sequence of zeros or ones as an answer to those questions which is commonly known as a bit(binary digit). If Zero represents No and one represents Yes and the actual outcome of the toss was Head, Head, and Tail. Then you would need to send him 1 1 0 to convey your facts(information). …

# Probability

Often in life, we are confronted with uncertainty. Be it in rolling dice, stock price, or the winner of the champions league or any other things. Suppose I have a coin and I am going to flip it. How likely it is to come up head or tail or even side? By instinct, we say it is less likely to come side as an outcome of our experiment. But how can we represent such uncertainty in numbers? This is where the probability comes into play.

It is a mathematical tool that helps to quantify the uncertainty of events so as to know what is likely to happen and what is not. Probability is a measurement of how strongly we believe things about the world. …