Sep 5, 2018 · 1 min read
Great post and I am gonna try it on my dataset. I am thinking I will be very careful with removing the features that may have some collinearity with one another before I test them all by cross-validation. As long as there is no significant overfitting due to high variance, I tend to keep these features or build new features based on correlated ones (and then dump the old ones) rather than simply removing them. Even though this may make interpretations of the coefficients harder, in many cases we may not care less about what those weights mean than a good prediction. Furthermore threshold of setting this removal is also tricky and highly empirical.