Whenever I find a topic I can’t find a sufficiently good tutorial or explanation of online, I feel compelled to offer one. I hope this helps you.

The hypergeometric distribution describes the probability of events in the following scenario:

Suppose you have a jar containing 10 red marbles and 90 black marbles.

You collect 10 marbles from the jar.

What is the probability you collect ** k** red marbles?

Collecting a single red marble seems intuitively most likely, but if you collected none or a couple, that wouldn’t be too surprising. …

This article is about different ways of regularizing regressions. In the context of classification, we might use logistic regression but these ideas apply just as well to any kind of regression or GLM.

With *binary* logistic regression, the goal is to find a way to separate your two classes. There are a number of ways of visualizing this.

No matter which of these you choose to think of, we can agree logistic regression defines a decision rule

and seeks a theta which minimizes some objective function, usually

loss(theta)= ∑ y*log(h(x|theta)) + (1−y)log(1−h(x|theta))

which is obfuscated by a couple clever tricks. It is derived from the intuitive objective…

“ ‘All models are wrong, but some are useful.’

So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.”

So proclaimed WIRED editor-in-chief Chris Anderson 7 years ago, opening the July 2008 issue of stories relating to the advent of “*The* *Petabyte Age*” with his piece entitled: “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”. …

About