False positive and false negative. Type I error vs Type II error explained

When a person learns about hypothesis testing, they are often confronted with the two errors — false positive and false negative, or type I error and type II error.

365 Data Science
365 Data Science
10 min readJun 8, 2017

--

False positive and false negative. Type I error vs Type II error explained
False positive and false negative. Type I error vs Type II error explained

There is no point in telling you I was not a big fan of them, either. To me, the two errors sounded like the most useless piece of knowledge. However, throughout the years, I had a change of heart. Now, I have reached the point, where I sound like a professor, encouraging her students to distinguish between the two errors even before they have done their first research project.

Here is an explainer video we made.

We hope it makes things clearer.

However, the purpose of this article is not to define or distinguish between the two errors. The explainer video does that well enough. This article is about comparing the two errors in different settings and showing their practical implication.

A quick recap.

Type I error is when you reject a true null hypothesis. It is also called a false positive. Type II error is when you accept a false null hypothesis, a.k.a a false negative.

Alright. Which error is more serious? When I was learning, I read in several places that type I error is the more serious one. I still think this is correct, but not to a full 100%.

It all depends on the way you structure the study. I have always followed the statistical principle to form my null hypothesis in a way that makes me try to reject it.

For example, let’s say I want to see if this article is going to perform better than the average of the other articles we have posted.

My null hypothesis will be: the number of reads of this article is less or equal to the average number of reads of similar articles we have posted.

If I reject the null hypothesis, this means one of two things.

1. This article performed above average

2. I have made a type I error. I rejected a null hypothesis that was true. My test showed that I performed above average, but in fact, I did not. I got a false positive.

Let’s say we are in the second case and I got a false positive. What are the implications? I am led to believe that this article was a successful one, so I take on the challenge of writing several similar articles. But because the original article was bad in the first place, I will soon find out that nobody is reading the blog. Devastating for my self-esteem and career.

What about the type II error — the false negative?

The false negative would occur when my test shows that this article is not even mediocre, when it is actually a masterpiece (compared to our other articles, of course). If that’s the case, being the data driven people that we are, we will take remedial action and drop this type of articles altogether. This is not necessarily a bad thing. As we don’t want to go out of business, we will try to come up with much better ideas. Therefore, the false negative would probably lead to more motivation and innovation.

Alright. Let’s compare the two cases. The false positive leads to an inefficient and boring blog, while the false negative could bring up motivation and innovation. If you make a comparison between false positive and false negative, it is obvious the false positive is much, much worse.

Now, a crucial consideration is the design of the study. I CHOSE the null hypothesis in a specific way. Had I swapped the null and alternative hypotheses, the errors would have been swapped, too. Let me show you.

New null hypothesis: the number of reads of this article is more than the average number of reads of similar articles I’ve written.

In a false positive situation, I would reject a null hypothesis that is true. So, the test would show that my masterpiece is actually mediocre or worse. Remember this phrase? That was the false negative from the previous example.

What this shows is that the two errors are interchangeable. Therefore, it is all about the design of your study.

Alright. That was an interesting piece of trivia. Nobody ever thought of that (NOT).

Bravo.

How is that useful?

Well, we know we cannot avoid both errors at the same time. Usually, when you mitigate one, the chance of committing the other increases. This is a weakness, but also a strength.

You can mitigate one of them.

What did I tell you earlier? If you swap the hypotheses, you are swapping the two errors as well. Therefore, you can design your experiment in a way that helps you avoid the bigger problem.

Example time.

When you are applying for a job in the data science industry, one of the frequent interview questions you will be asked is “Provide examples of situations when a false positive is more important than a false negative”, and vice versa.

Let’s see how you can answer this question and win over the interviewer.

1. Pregnancy test.

The question you are asking when you are taking a pregnancy test is: am I pregnant? But!

The null hypothesis is: I am not pregnant.

False positive in this situation is a positive pregnancy test, when you are not pregnant. The false negative would be when you are pregnant but the test shows you aren’t. Which one is the bigger problem? Well, it really depends on whether you want to have a child at the moment or not.

If you want a child, the false positive will make you very happy but for a very short time — until you realize it was an error. Conversely, a false negative will make you unhappy until a month later, when you become certain you are going to have a baby. Both cases are undesirable, but I personally would prefer the false negative error.

On the other hand, if you don’t want a child, false positive would be preferable error. You certainly don’t want to be pregnant without knowing it.

To sum up, there is no ‘better error’ in this situation. Depending on the situation, you can prefer one or the other.

Trivia time!

Pregnancy tests minimize the false negative. Statistically speaking, this increases the power of the test. There are a number of medical reasons to get a false positive, but false negatives appear only due to faulty execution of the test.

2. AIDS test

A patient goes to the hospital to take an HIV test.

The null hypothesis is: the patient doesn’t have the HIV virus.

A false positive would be when the patient gets a result saying she has HIV, although she doesn’t. This has the capacity to be devastating for her and her relationships with others. But at some point, after starting the treatment, the doctors will find out that she doesn’t really have the virus and all will be well again.

A false negative, on the other hand, would mean that the patient has HIV, but the test showed negative results. It is unlikely that this patient would repeat the test anytime soon, and the probability of passing the virus to other people is significant.

Clearly, the false negative here is the much bigger problem. Both for the person, and for society.

Trivia. Many doctors call AIDS results reactive, rather than positive, because of false positives. Before a patient is definitively said to be HIV positive, there are a series of tests carried out. It is not all based on a single blood sample.

3. Presumption of innocence

One of the big questions in criminal justice used to be whether the suspect should be guilty, unless she proves she is innocent, or innocent, unless somebody else proves that she is guilty.

The current state of affairs is: innocent, unless proven guilty. This comes from the Latin

‘Ei incumbit probatio, qui dicit, non qui negat; cum per rerum naturam factum negantis probatio nulla sit’, which translates to: “The proof lies upon him who affirms, not upon him who denies; since, by the nature of things, he who denies a fact cannot produce any proof.”*

Statistically speaking,

the null hypothesis is: the suspect is innocent.

The alternative hypothesis is: the suspect is guilty.

This makes sense, right? Now let’s think about the statistical errors.

A false positive in this case is someone who is found guilty when, in fact, innocent. And the false negative is when someone is found innocent when, in fact, guilty.

Proving that someone is guilty is not easy, but proving that someone is innocent is much, much harder. Without getting much into the details of history, the norm became that the false positive is the bigger problem. In other words, we should not put innocent people behind bars. The rationale is that protecting one innocent person, in exchange of 5 guilty people walking away, is still worth it.

Trivia. Until recently Mexico was using the ‘guilty, unless proven innocent’ system. As a result, judges would not even open most criminal cases, because they would have to put too many innocent people in jail. Since 2008, Mexico’s criminal justice system has been transitioning to ‘innocent, unless proven guilty’.

4. Breath alcohol test

We hate it when the police stop us for an alcohol check. But, what are the false positive and false negative issues there?

A false positive would be a breath alcohol test that shows you’ve had more drinks than the acceptable limit, when you haven’t. A false negative shows you are below the limit, when you aren’t.

I’ve heard of both false positive and false negative results because breath alcohol sampling has many issues. Therefore, in some countries, after a positive test, the law allows you to request a blood or urine sample to prove you are innocent (if you are, that is).

What’s interesting for us, however, is which error is worse. We have a clear winner. Due to the safety net of the blood and urine samples, the false negative is the bigger problem, if not the only problem. You definitely don’t want to acquit drunk drivers as they impose a threat to society. Someone losing a couple of hours due to a false positive is a low price to pay for mitigating the false negative.

Trivia. Common alcohol levels at which people are considered legally impaired for driving range from 0.00% to 0.08%. The most common benchmarks around the world are 0.00%, also known as the zero tolerance, and 0.05%. The limit is the highest in the Cayman Islands, standing at 0.1%. But before driving drunk there, keep in mind that the local police really do enforce the laws with frequent checks.

5. SPAM

Finally, I want to touch upon false positive and false negative in SPAM. You’ve probably seen websites that tell you: ‘Please, check your SPAM folder. The email that we just sent you may get there’. Email providers are increasingly using data mining algorithms to confirm if an email is SPAM. This is a very interesting process, which deserves a separate article. Here, however, we will talk about misplaced emails.

I was truly stunned some weeks ago, when I sent an email to my sister and her email provider marked it as SPAM. The only explanation I have is that I used my personal Yahoo mailbox to email my sister’s company email address.

So, let’s see the null hypothesis of a SPAM test.

Null hypothesis: this email is SPAM.

If you reject the null hypothesis, the email goes through. Otherwise, it is marked as SPAM and goes to the Junk box. The false positive in this case is a SPAM email that doesn’t get filtered. In my case, however, the algorithm had accepted the null hypothesis when it was false, thus making a type II error, or a false negative.

What are the implications? Well, people are so crazy about SPAM these days, that there is ever-evolving protection from SPAM. Cool. But this leads to omission of important correspondence, such as my private conversations with my sister.

So, which problem is worse? Getting a couple of SPAM emails every now and then, but keeping all your personal and business correspondence intact, or getting no SPAM, but creating tension in your personal or business life?

Again, it depends on your personal preferences.

If I have to compare the false positive and false negative in the case, I would say the false negative is the bigger problem here.

Trivia. Above 95% of the friend requests you send on Facebook are accepted, as you usually reach out to people you know. This is not true for SPAM accounts and this is one of the ways Facebook detects them. However, recently bots adopted a strategy where they pretend to be attractive females and focus on male users as their victims. Because male users, on average, accept these friend invitations, it takes much longer to detect the bots.

These are some common situations when you have false positive and false negative. Which error is the bigger pain in the neck depends on the study design, and, in some of the situations, on your personal preferences.

And don’t worry, if you don’t want to change your personal preferences, you can always change your hypothesis test. Bear that in mind the next time you are testing.

*F. Nan Wagoner (1917–06–01). “Wagoner’s Legal Quotes web page”. Wagonerlaw.com. Retrieved 2017–05–02.

Originally published at 365datascience.com

--

--

365 Data Science
365 Data Science

Become an in-demand #DataScience professional TODAY with the 365 Data Science Program -> https://365datascience.com