The ultimate overfitting solve

Erin Fahrenkopf
Jul 28, 2017 · 2 min read

To fundamentally solve the overfitting problem we don’t fit. Instead, we strategize for success regardless of the exact predictions given out by a well fit model. We don’t need to throw out our data to do this, but just use it differently.

We focus our analysis on understanding what’s possible and not what’s most likely. We look at the density of dispersion in the data and extrapolate what other values would come from data distributed with such a distribution. We then use the possible range to inform our decision making or strategy. While a traditional, fitting method may show that such extreme possibilities are unlikely if our distributional, what’s possible method shows that such extremes are within a distribution consistent with our data then we know to consider the possibility of these values.

We then can focus our analysis on understanding the impact of outcomes and not on the likelihood of the outcomes. What would happen to us if the best case value from our distributional analysis was realized or what about the worst case? For example, we would focus on what would happen to our bakery if daily orders of croissants was 1, 100, or 1000 (would the bakery survive if it only sold 1 a day or would it be able to produce 1000 if croissant demand was that high, etc.) This is again different than a likelihood analysis that would give the expected number of daily croissant orders (such as we expect to sell 127 croissants a day).

To build up an understanding of the impact of outcomes, we examine what we would do and how our performance would fare given different possible values of our data. We then strategize to fare well under as many outcomes as possible, and ideally, find a strategy in which we can succeed under all possible outcomes. If we can get to a place where we will be successful (or build something that can function) under all possible outcomes then we have fundamentally solved the overfitting problem. The fit and chance of overfitting don’t matter because we will thrive regardless of the outcome’s exact realization.

To sum, the overfitting solve is to use data to inform what best to hope for and what worst to prepare for and not which one is more likely.


Originally published at ablifeing.blogspot.com.

Erin Fahrenkopf

Written by

Interests are statistics, data, the organization of work, evolution and using science practically.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade