Haihan Lan
Sep 5, 2018 · 1 min read

Hi Rodrigo,

If you did not do a simple train-test split (or something more fancy like K-fold cross validation) then you don’t have a reliable way of determining how well your RF model fits the data (you have to consider both training and validation loss). If your RF model has a bad or marginal fit (validation AUC less than 0.6) then the variable importance your RF model decides on is not reliable or may be totally meaningless (you don’t wanna tell that your boss/product owner/scrum master do you? lol…). So it’s always a good idea to do a train and test split to evaluate how well your models are fitting the data before you interpret outputs such as variable importance in the case of RFs.

Good Luck.

    Haihan Lan

    Written by

    Data Scientist @Procurify