2020 is a special year for all Americans — it is the year of the US Census. The US census is one of the biggest population surveys in the world and is mandated by the US constitution. It takes place every ten years and helps the government decide where to build and maintain schools, hospitals, transportation infrastructure, and police departments to name just a few. All in all, the various censuses determine the allocation of over $400 billion in federal funds every year.
To celebrate this occasion, I collected 230 years of US census data and turned them into a bar chart race. I documented all the steps that went into this data project — so that you can follow them if you would like to tackle a similar project in the future. …
In this post, we explain how bias and noise in machine learning are two sides of the same coin.
God does not play dice. — Albert Einstein
Einstein famously gave this statement in reaction to the emerging theory of quantum mechanics, which seemed to defeat the fundamental laws of physics. Read on to see how the same goes for the fundamental laws of machine learning.
Before getting into it, let’s briefly review the classical bias-variance-noise tradeoff. …
In this post, we explain the bias-variance tradeoff in machine learning at three different levels: simple, intermediate and advanced. We will follow up with some illustrative examples and discuss some practical implications in the end.
If you can’t explain it simply, you don’t understand it well enough. — Albert Einstein
Understanding the bias-variance tradeoff can be a bit tricky at first as it involves several quantities that are never available in practice. Nonetheless, it provides insights into best practices for optimizing real-world machine learning applications.
The prediction error of a machine learning model, that is the difference between the ground truth and the trained model, can be decomposed as the sum of two error terms: the bias term and the variance term. Since the total error is the sum of these two error terms, there is a trade-off between the two. …