Exploiting feature maps to increase end-user trust

Pascal Niville
Ixor
Published in
4 min readDec 2, 2019

Machine learning in production, a little story:

Imagine a physician who is responsible for establishing a cancer diagnoses. A new patient comes to see him, as usual he connects the patient to the machine that scans the patient for cancer symptoms. After some beeping and buzzing, the machine outputs one number: the confidence of the patient having cancer.

For this patient the confidence is 0.72. What should the physician do?

In general we can expect that a higher confidence corresponds to a higher likelihood of the patient having cancer, but what would be the difference between 0.7 and 0.75? or 0.49 and 0.51? Where should the physician draw the line?

Feature maps: a concise input representation

As you can feel for yourself, presenting a single output number to the end users of a machine learning application might cause doubt and lost of trust. In order to increase the end users, in this case the physician, confidence in our applications, we adopted an easy trick to provide more information than just a number. To do this we leverage the use of the intermediate feature maps of our neural network classifier.

Feature maps are the intermediate results after every layer in our network. When we look at the feature maps of the last layers of our network, we typically find already concise representations of the input, separated in a favourable way for our classification.

Feature maps of a convolutional neural network. source

In analogy to word embeddings in NLP, similarities can be calculated between the vectors of these feature maps. In general we can say that similar inputs will have similar feature vectors.

“I have seen similar inputs before”

In production the feature vectors of the new input can be compared to the feature vectors of the data used for training. The top similar inputs can be presented to the end user together with the confidence of the model. This way we can let our model say. “I’ve predicted this, because it is similar to these inputs I’ve seen during training”.

“This is something new”

If no similar inputs have been found in the training set, we can assume that the new input lays outside the domain of what the model has seen during training. Therefore the prediction has to be assessed with care by the end user.

The image below illustrates what is meant by the above statement. It shows the model’s confidence intervals in relation to a changing input. When the inputs go outside the boundaries of the domain covered by the trainings data, the confidence intervals of the corresponding predictions start to diverge. In other terms, neural networks don’t generalise well to inputs that differ greatly from the data seen during training.

Neural network confidence intervals vs covered training data. Image taken from [1]

In the context of retraining, the collection of new data samples can be optimised by retaining only the samples with little similarity to the original dataset.

Interesting reads on the generalisation topic are:
Neural Arithmetic Logic Units
Everything regarding Bayesian neural networks e.g. Convolutional bayesian networks

Some use cases:

Leveraging feature maps for leukaemia detection

An example project where we used the idea of leveraging feature maps to assist the end user’s decision making, is a project for leukaemia detection. In this project different cells in the bone marrow need to be classified. A more detailed description of the project can be found here.

In the case of a difficult to classify cell, the physician can get a better feeling of why our classifier made a certain prediction by showing the top similar cells in the training dataset.

Similar inputs from the trainingset are displayed for the highlighted cell. The image is a screenshot of our Ixorthink portal.

Leveraging feature maps in colon cancer detection

In some use cases it doesn’t make much sense to map similar inputs to the prediction. In example, for a project on colon tissue classification, where we classify tiles of tissue, the presentation of similar tissue tiles in the training dataset would appear rather random to the physician.

A more informative approach in this project was to highlight similar predicted tiles in the whole tissue slide itself. This operation relieves the prediction from its abstractness and gives it more context, which increases the trust in our model of our end users.

References

[1] Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. “Weight uncertainty in neural networks.” arXiv preprint arXiv:1505.05424 (2015).

At IxorThink, the machine learning practice of Ixor, we are constantly trying to improve our methods to create state-of-the-art solutions. As a software-company we can provide stable products from proof-of-concept to deployment. Feel free to contact us for more information.

--

--