Evaluating Feature Importance

Deep Learning: Feature Importance Estimates

Q: The paper regards Feature Importance, what is this

Just a premise: it is well known as,for example, a CNN trained to solve a regression problem internally builds a hierarchical semantic representation out of the input image. Each layer output can be considered as a semantic representation and the deeper the layer, the more semantic (and less spatial, because of spatial pooling operations) is the representation, until the final layer output (bottleneck feature).

CNN schematic

One of the biggest open problems in Deep Learning is to try making these semantic representations human understandable (at least at some extent), otherwise the whole CNN gives the idea to work as “black box”

Getting the semantic of the representation is hard but as an alternative it’s possible to try getting the importance of a specific feature to solve the final task, as we have a metric to evaluate how the DNN is performing (e.g. precision)

Q: Why is the problem of understanding feature importance interesting and how will you do it

As it is said before, DNN are pretty mysterious objects as they work very well but there is no clear explanatory theory for this yet, just some ideas, conjectures, …

The problem of getting a specific feature semantic assumes there is some kind of ontological mapping from the CNN data-driven feature space and the human language space but there is no guarantee this assumption hold

Furthermore it is not even said this is so useful, I’m going to explain it with an example: let’s assume some genius programmer, we are going to call him Bob, has coded the best pedestrian detector ever, no data-driven approach just an explicit algorithm.

Bob, the genius programmer

This code will probably have a lot of intermediate states represented as variables (if you are a zealot functional programmer you would probably not like this assumption, sorry for that :) ) and we hope the coder has spent some time giving these variables a meaningful name, so to help us understand the code.
Let’s now assume Bob is not very disciplinated in variables naming hence they are not very meaningful so we try to make sense of them by studying the code (or you might go ask Bob)

That’s for sure a fairly big job but the point is: do we really need to go so deep in the code, getting a so much fine-grained understanding of it, or would a more coarse grained understanding be sufficient to make sense of the strategy?
We could for example use the semantic Bob puts in his code (hence focusing on the donut, not on the hole) and guess what the architecture is

This is to say it might be pretty hard to map each variable semantic into human language but above all it is likely to be much less useful than understanding the underlying idea: Bob’s implementation could in fact be suboptimal e.g. some of its variables could be unseful.

This is to say it might be pretty hard to map each variable semantic into human language and first of all it is likely to be much less useful to understand the underlying idea: Bob’s implementation could in fact be suboptimal e.g. some of its variables could be unseful.

The problem is that guessing the underlying idea of an implementation performed in a certain programming language from a static analysis (just looking at the code) is orders of magnitude easier (even if it could be a very hard task, depending on the language provided semantic and the programmer’s capability / willingess of writing “readable code”) than doing the same on a trained CNN hence we need an alternative strategy

An approximation could be understand a certain feature importance in solving the specific task the CNN has been trained for : provided we have a proper metric to evaluate CNN results (e.g. precision) we could perform controlled experiments at runtime consisting of zeroing some features and observe the effect on the final results.

At the end of properly designed experiments sets, we should be able to estimate a features importance ranking for the target task : it’s not like getting to feature semantic, but we might say this feature is important than that feature in solving that task.

--

--