Aug 23, 2017 · 1 min read
These are surprising results. I was surprised by the fact that gradient boosting, which has no apparent memory, does as well as a NN with many LSTMs. Doesn’t that imply that you don’t really need memory? I’m also surprised the only significant difference is in their reliability scores. I’d be very interested in two experiments: (1) what are the AUC differences in the two models? (2) after calibration, what do the reliability curves look like for each? If you’re ever inclined to do these experiments, I’d be very interested in seeing the results.
