In this post, we are going to learn everything we need to know about bias and variance and their decomposition with the help of an example and graphical visualizations.

Image for post
Image for post
Photo by Bret Kavanaugh on Unsplash

Problem Setup

Suppose there is an unknown target function or “true function” f(x) that maps input vector X to output Y. For instance, f(X) could be the function that takes features of the house (#no of the bedroom, distance to the nearest hospital, etc.)and maps it to its corresponding price Y.

For any given input X there might not exist a unique label Y. we could imagine two houses with identical features but with a different price. This randomness could be due to various factors and that affects Y but we haven't taken into account with our function f. The relationship between the seller and owner can be one of such factors. …

In this post, we are going to look at a way of comparing two probability distributions by using KL divergence and also find its relation with cross-entropy.

Image for post
Image for post
Photo by Lavi Perchik on Unsplash


KL divergence has its origin in information theory. But before understanding it we need to understand another important metric in information theory called Entropy. Since this article is not about entropy I will not cover it in depth here. I wrote about it in detail here.

The main goal of information theory is to quantify how much information is in the data. Those events that are rare (low probability) are more surprising and therefore have more information than those events that are common (high probability). For example, Would you be surprised if I told you the coin with head on both sides gave head as an outcome? No, because outcomes did not give you any further information. …

Why the more uncertain we are about the result of an experiment the more information we get after observing it? What is the relation between entropy and information? You will get the answer to those questions and more by going through this article.

Image for post
Image for post
Photo by Roman Kraft on Unsplash

Thinking in terms of Bits

Imagine you want to send outcomes of 3 coin flips to your friend’s house. Your friend knows that you want to send him those messages but all he can do is get the answer of Yes/No questions arranged by him. Let’s assume the arranged question: Is it head? You will send him the sequence of zeros or ones as an answer to those questions which is commonly known as a bit(binary digit). If Zero represents No and one represents Yes and the actual outcome of the toss was Head, Head, and Tail. Then you would need to send him 1 1 0 to convey your facts(information). …

In this post, I am going to introduce you to the basic stuff about probability including the axioms, conditional probability, mutually exclusive and independent events along with sum rule and product rule with lots of examples.

Image for post
Image for post
Photo by 🇨🇭 Claudio Schwarz | @purzlbaum on Unsplash


Often in life, we are confronted with uncertainty. Be it in rolling dice, stock price, or the winner of the champions league or any other things. Suppose I have a coin and I am going to flip it. How likely it is to come up head or tail or even side? By instinct, we say it is less likely to come side as an outcome of our experiment. But how can we represent such uncertainty in numbers? This is where the probability comes into play.

It is a mathematical tool that helps to quantify the uncertainty of events so as to know what is likely to happen and what is not. Probability is a measurement of how strongly we believe things about the world. …


Sobit Regmi

I am an engineer working in the multidisciplinary fields of data science. Currently I am working as a Machine learning Engineer Associate at Fusemachines.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store