What are some solutions to overfitting?

I have described the overfitting problem and now want to share some overfitting solutions. Below I go through overfitting solutions clumped by similarity. There is no perfect solution and that is why overfitting is such a fun problem to learn about and work on.

Solution bucket 1: Feedback

We can use feedback on whether our model is correctly predicting the estimating variable when we can get the real value of the estimating variable. We would use the real values of the estimating variable and assume positive feedback — as in feedback that the model is doing well — when our model gives a correct prediction and negative feedback when the model gives an incorrect prediction. With this feedback, we can assess and modify the model to be more accurate as outside data comes in. A popular way to use feedback to assess and reduce overfitting is dividing our generation dataset into two (or more) segments and generate/train the model on one part and then test how well the model predicts in the other segment(s). If the model is similarly accurate in estimating the estimating variable in all data segments then we conclude no overfitting problem. But we would still have limited knowledge on how the model will do on new data which is actually where we are interested in making predictions.

Solution bucket 2: Theory

You can condition the information gleamed from your data generated model using theoretical knowledge. In an extreme, you could forget about the data and just operate from theory which would totally reduce overfitting since you would have no fit to begin. But more realistically, you could use theory to help direct the relationships allowed between variables in your data generated model or guide how you interpret your data. For example, we may have a theoretical understanding of the weather such that we know the temperature on Thursdays at 5pm is always 3 degrees warmer than the 5pm temp on the Wednesday the day before. Thus, in our model predicting Thursday 5pm temp all we need from the data is the prior day’s 5pm temp and can ignore all other weather data. Then we use our “theoretical knowledge” and get out an estimate of Thursday’s 5pm temp. We are not overfitting to weather data here because we are relying upon our theory and only a little data to get our estimates. On the flip side too, we probably are not going to be that accurate.

Solution bucket 3: Grain of salt

We can simply take our model’s predictions with a grain of salt and realize they are overfit to the generation dataset and give less accurate estimates on any other dataset. We simply remember Fit(generation dataset) > Fit(outside dataset) and decision make according. One way to do this is deflate the expected size of the relationships estimated by your model and another is to limit the fit in the generation dataset. The downside of taking your model with a grain of salt is that we want precise predictions (or, models that work well) and taking with a grain of salt puts our models’ limitations up front and center.

Solution bucket 4: Widening the generation dataset

We can increase the context of the generation dataset so we have data on variation covering a wider range of the world. This is different than just increasing the amount of data in the generation dataset because if the data you are adding does not reflect different stuff then it does not help. Instead, we want to add data that captures farther back history or slightly different situations. In other words, we want to add in more data that widens the extent of information captured in the generation dataset. In one of my research projects, I was generating a model of effective group collaboration based on data from one group experimental setup. To help reduce the extent of overfitting of my collaboration model, I then found data from other experiments that have different setups but all examined groups. Including data in my generation dataset from multiple group experiments as opposed to just one group experiment widened my generation dataset since each experiment had a different group task, manipulation and overall procedure.

Currently, I am all about this solution! But as with all of the solutions, it is no panacea. Even when you widen the generation dataset, you are still at risk that the domain where you plan to predict is different than what’s covered in the generation dataset. We may have reduced the chance that what we want to predict is fundamentally different than what our model expects, but we not have fundamentally solved the problem.


Originally published at ablifeing.blogspot.com.