Image for post
Image for post
Homer Simpsons is known to be a simple person who makes silly deeds from time to time. Here he decided to cook bread, bacon, and eggs using the microwave. We found it funny to watch, but at the same time, we tend to do similar stuff in data science. Don’t be like Homer; use proper tools. This shot from Simpsons (“Homer The Smithers”, Season 7 Episode 17) is believed to qualify as fair use

Local Interpretable Model-agnostic Explanations (LIME) is a popular Python package for explaining individual model’s predictions for text classifiers or classifiers that act on tables (NumPy arrays of numerical or categorical data) or images. LIME was firstly introduced in the paper called “Why Should I Trust You?”: Explaining the Predictions of Any Classifier” in 2016, and since then LIME project repository hit a point of almost 8k stars (for comparison, scikit-learn has 42k start).

While being one of the most popular approaches for explaining predictions of any classifier, LIME has been several times criticized in the research community: LIME suffers from labels and data shift, explanations dependent on the choice of hyperparameters (yes, LIME has hyperparameters) and even similar points may have different explanations. …


Image for post
Image for post
Homer Simpsons is known to be a simple person, who makes silly deeds from time to time. Here he decided to cook bread, bacon, and eggs using the gas stove as it was a bonfire; the tool which isn’t suitable for such purposes. We found it funny to watch, but at the same time, we tend to do similar stuff in data science. Don’t be like Homer, use proper tools. This shot from Simpsons (“Homer The Smithers”, Season 7 Episode 17) is believed to qualify as fair use

Data scientists need features importances calculations for a variety of tasks. Importances could help us to understand if we have biases in our data or bugs in models. Also, importance is frequently using for understanding the underlying process and making business decisions. Indeed, the model’s top important features may give us inspiration for further feature engineering and provide insights on what is going on.

There are a lot of ways how we could calculate feature importance nowadays. Some of them are based on the model’s type, e.g., coefficients of linear regression, gain importance in tree-based models, or batch norm parameters in neural nets (BN params are often used for NN pruning, i.e., neural network compression; for example, this paper addresses CNN nets, but the same logic could be applicable to fully-connected nets). …


Image for post
Image for post
Photo by Bryan Goff on Unsplash

I just finished reading The Clean Coder: A Code of Conduct for Professional Programmers by Robert C. Martin. Incredible book, brand-spanking new coder, ten out of ten. I recommend it to everyone who writes code on a daily basis.

I decided to make some notes that combine advice from the book with some of my own experience in data science and machine learning. I tried to write these notes in an actionable form so it will be easy to make habits from them.

  1. Take responsibility
  2. Be responsible for your career
  3. Use TDD when you are going to re-use code
  4. Don’t be afraid to say…

About

Denis Vorotyntsev

Sr Data Scientist at Oura, Helsinki

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store