One of our talented data scientists, Rafah El-Khatib represented ING WB Advanced Analytics during this year’s Data Science Festival. London’s Data Science Festival is successfully organised for three years now, gathering more than two thousands data enthusiasts and top tech companies. Rafah’s talk was based on Feature Selection Best Practices — LOFO, and a Survey of Key Feature Importance Packages
In a previous blog we talked on the Leave-One-Feature-Out Importance, a promising open source contribution made by WB Advanced Analytics Data Scientists: Ahmet Erdem, Rafah El-Khatib, Eva van Weel and Stephane Collot.
The idea for the feature importance python package was initiated by Ahmet Erdem, during WBAA’s Data Science Focus Sessions, where our data scientists can exchange insights on their projects’ progress or the technical struggles they might be experiencing, and from there, come up with smart solutions.
“There are many ways of calculating feature importance. Therefore, when you talk about one, you need to explain it all the time and all of them have different assumptions. We wanted to have a model and validation scheme agnostic method that is directly related to the performance so that we can use it in most of the cases. It was also something that I was doing manually for Kaggle competitions, so it would be good to automate. LOFO is developed on Github totally open source” — Ahmet Erdem
Rafah explained on what triggered her to contribute at the LOFO open source.
“I’ve needed to evaluate feature importance in the context of feature selection in my previous work and had several approaches. One of which was a specific case of LOFO that I was reproducing every time I needed it, so when Ahmet suggested making a more generic package that can solve many problems at once, I was more than happy to contribute!”
Rafah’s Presentation key areas:
· Definition and classification of feature importance metrics
· Explanation of the difference between global and individualized metrics, and between value-based and metric-based metrics
· A survey of feature importance metrics and explanation of applications in which each is (un-)suitable.
· Explanation of the LOFO (Leave-One-Feature-Out) Importance and what problems it solves (when it is suitable to be used) and its fast approximation FLOFO (Fast LOFO).
· Description of individualized FLOFO that is calculated per record, something that the team is currently working on, and comparison to Shapley values”.
The audience was very enthusiastic and interested in Rafah’s talk and the LOFO package:
“Some of them were already investigating feature importances and comparing them; others were reminded of why they should always evaluate feature importances, for model debugging for example. During my Q&A session, questions ranged from inquiring more details on LOFO, suggesting variations for it, and comparing it to other feature importance packages” — Rafah
Watch Rafah’s presentation below: