Problems with Random forests

Dataset generation and model fitting
Scatter Plot of x and y
Model predictions for the train data
Model Predictions for validation data

Ways to solve the extrapolation problem

  1. One way to avoid such situation is to remove all unnecessary time-dependent variables and train the model on other features.
  2. But again when there is really a time series like in the example above, there is nothing much we can do. We can try to use some time series techniques to detrend the data but it still won’t make the model perform very well.

Conclusion

Though random forests are very robust and can work on most of the datasets like any other model it has some problems and won’t work well on datasets which have time series trends (Extrapolation). It’s important to identify this quickly and in such cases use other models like Neural Nets which can capture such trends easily.

Further Reading

  1. https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store