Less is More

Geoffrey Gordon Ashbrook
Wooden Information
Published in
7 min readMay 8, 2020

Predict The Future…Using Only The Past

Sometimes less is better for finding something. Why might you need less? Less information…less data…It may sound backwards. Isn’t more data better? Here is an example.

Banks:

Sometimes banks want to give (loan) people money. Sometimes banks do not want to give people money. What if…

What if we could predict? What if we could use information to say: Banks will like this person, or banks will not like that person.

From this example we will see how less information can be much better.

Think for a moment:

We will look at a person and then say if a bank will like that person. What should we look at?

The bank has lots of information about people that we COULD use. Should we use all of it?

Let’s take a look at one area…

There are nine ‘classes’ of information here. Let’s think about why we might not want to look at all of this information.

When we look at a person (when a bank looks at a new person) what CAN we see?

Can we look at a person’s job? Sure. They have a job (or maybe they don’t).

Can we look at a person’s house? Sure. They live somewhere, or maybe nowhere.

How much money did the person give the bank last month?…uh…

Did the person pay the bank a fine (a penalty)?…hmm….

Remember, this is a new person to the bank. The bank and the person are strangers. The bank has not given them anything, nor the other way around. So any question about the person giving something to the bank, or the other way around, does not make any sense…or that would be information from the future. But we cannot time travel, we can only model what we can know NOW.

If we try to use information “from the future,” information that we cannot know now, then our model will not work.

Important Review Point: For each kind of information you use, ask: “Can we know this NOW?”

So we throw away of all that “future information.”

Now we have only information from the past. We have less information, and our prediction will be much better. Our understanding of what is important is much better. But why?

What happens if we try to use (cheating, illegal) information from the future?

Note: The primary feature is too strong.

Our model will look very good, it will look…too good. But it won’t be useful in a real situation.

How can we know what features may be more important?

In More Detail

Let’s take a look at three more ways to be careful about whether something is important or not.

1. ~Illegal Features

2. (Relative) Feature-Importance

3. Evaluation Prediction: Using The Confusion Matrix

1. Illegal Features

Previously we looked at ‘illegal’ features that were too good because they cheated into the future.

But there are many tools that can help us see what impact a feature is having (if any) on the prediction. There are tools that can help us to focus on what is good to look at or what should be tossed away.

Even just seeing a list of which features contribute most within a model can be very interesting.

But how are they contributing? Do they all contribute in the same way?

Here you can see features may be contributing to ‘Yes’ or ‘No’ outcomes.

2. Feature Importance:

What is a feature? “Feature” is the word used for a specific observation, or something that will answer a very specific question. “Where is the person from?” or “What is the person’s job?”

Looking at feature importance can be important in two sort of opposite ways.

If a feature is ‘too good’ then it maybe ‘information from the future.

If a feature is ‘useless’ then it can also be ignored.

We can look at positive cases:

And we can look at negative cases:

These (above) images are called “Shapely” (pronounced: Shapp — Lee) plots. They are great because they let you zoooom in on one specific case, and see which features made the biggest difference contributing to the prediction. (This can also be used to look at cases incorrectly predicted, which can let you see into how the model works — despite some people complaining that models are impossible to examine.)

3. Prediction:

When we make a prediction, how can we say if it was a good prediction or a bad prediction?

In every-day terms we can speak about a prediction being ‘good’ or ‘bad.’ But, to be more clear about it…without context it is not easy to say what would be good or bad, or more specifically, to clearly describe what would make the model good or bad for the situation.

For example, if it is some medical test: maybe we care more about not missing any possible problem, even if some people are incorrectly predicted to be sick. Or maybe, because of the situation, we care more about Not Wrongly predicting any illness.

To help with this…there is an amazing tool, (I honestly think it one of the most wonderful things ever discovered in the history of the Universe…) called: The Confusion Matrix!

The name may sound…confusing…(not a joke, that is really the name) but here is what a “confusion matrix” does: a confusion matrix clearly focuses on four things:

1. What was predicted to be ‘yes’?

2. What was predicted to be ‘no’?

3. What really is ‘yes’ (what should have been predicted to be ‘yes’)?

4. What really is ‘no’ (what should have been predicted to be ‘no’)?

That’s all!

Ta Da! An example From Bank Loans…so beautiful…

This may sound ‘too simple’ to be special, but this information is amazingly useful. Based on these four simple-seeming specific questions we can calculate some really amazing (amazingly different from each other) ‘scores’ to compare predictions. We can be very clear about what a prediction does or does not do, and so when it will or won’t be “good” or “bad” in a specific situation.

Here are some examples. (They are not completely described here, as perhaps books could be written about each one and still not exhaust all the angles.)

  • Accuracy (This has a technical meaning aside from the every-day meaning)
  • Recall (most people have never even heard of this)
  • Precision (Again, this is a technical meaning that is not the same as “accurate”)
  • ROC-AUC (AUC mean: Area Under the Curve. It uses True and False Positives to measure how good a model is at comparing groups (note: not continuous measured numbers))
  • F1:
  • False Positives:
  • False Negatives:
  • True Positives:
  • True Negatives:

(Here is a quick visual guide to False/True Positive/Negative:)

For any model, we need to look at the situation to evaluate how to evaluate the predictions (Yes, we are evaluating the evaluation!). Very exciting.

There are many other websites and blogs on this great topic. e.g.

https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c

https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

Quiz:

What have you learned today? If anything, hopefully you have learned to ask more questions. For example, if someone says: I have a model that predict bank loans with 99.999999% accuracy, then you should ask if they used future information.

Accuracy of Prediction of Random Forest Classifier

If someone says, “Here is a fancy sounding Random Forest Classifier model and the results were 83% accurate,” you should ask what exactly they mean by “accuracy” and whether that fits the situation you want. What does the whole confusion matrix look like? What is the recall? The precision? The F1? The False Positives? And most of all, what does the situation call for?

(originally published Oct 24, 2019)

--

--