# The Fundamentals of Bayes’ Theorem in Python

Apr 29 · 5 min read

Getting back to the basics of Bayes’ Theorem using Python.

## Thomas Bayes and Bayesianism

Thomas Bayes was a rather obscure 18th Century English clergyman and it is not even certain when and where he was born, but it was around 1701 and possibly in Hertfordshire just north of London. His only mark on history is the eponymous Bayes’ Theorem but the name Bayesian is now used in many different areas, sometimes with only tenuous links to the original theorem.

This gives the impression that Bayesianism is a huge and complex field covering not just probability but extending in to philosophy, computer science and beyond. In this article I will get back to the basics of the theorem, firstly by applying it to its “standard” example of medical tests, and then writing a simple demonstration of its use in Python.

## Bayes’ Theorem

Bayes’ Theorem is basically a simple formula so let’s start by chalking it up.

You may be familiar with the P(A) notation used in probability theory to denote the probability of a specific outcome or event, A. We use decimals and all probabilities add to 1, so for example if we know 1% of the population has a certain disease then P(ill) = 0.01, P(healthy) = 0.99 and 0.01 + 0.99 = 1.

The | symbol used in the formula extends the notation to indicate the probability of a certain outcome given an existing state, and the | can be read as “given”. If in the above example we assume a test is available for the disease then Bayes’ Theorem allows us to calculate the probability of a person having the disease given a positive test result.

You might assume that the probability of someone having a disease if they test positive is 1, and conversely the probability is 0 if they test negative. Unfortunately no medical test is perfect: some people with the disease will test negative and some people who do not have the disease will test positive. Even with a highly accurate test this can lead to some startlingly inaccurate results, as we will see.

Let’s make up a few fictitious numbers for an equally fictitious disease, just for demonstration purposes. We need to know the population and the percentage which has the disease. We also need a couple of numbers to describe the accuracy of the test: what percentage of people with the disease test positive, and what percentage of people who are healthy test negative. These are the sensitivity and specificity.

Now let’s assume everyone has been tested and we have the following figures:

The sensitivity and specificity rates of 99% look impressive, but as you can see from the previous table the number of healthy people who wrongly tested positive (shown in bold) is exactly the same as the number of ill people who correctly tested positive (again shown in bold). Therefore if a person tests positive there is only a probability of 0.5 that they are actually ill.

## Plugging the Numbers into the Formula

Using the process above we established the probability of a person testing positive actually having the disease. However, it was a messy process which can be simplified by using the formula for Bayes’ Theorem.

This is the theorem applied to our sample problem, which as you can see gives us the 0.5 result we are looking for.

The values above the line are straightforward, and come straight from our table of known data. However, the part below the line, P(positive), needs to be calculated from:

P(healthy) * P(positive|healthy) + P(ill) * P(positive|ill)

This gives us the overall probability of testing positive, irrespective of whether the subject is ill or healthy.

## Let’s Code It

We can stare at a (virtual) blackboard all day but to fully understand what’s going on it’s a good idea to implement the formula in code. This also gives us the opportunity to change values quickly and easily to see how this affects the outcome.

The code for this project is all in one short file called bayes.py which you can clone or download from the Github repository.

This is the source code in its entirety.

## The main Function

Here we just create a few variables for the population and probabilities which are then passed to the two functions which calculate the probability of being ill if testing positive.

## The calculate_without_bayes Function

In this function we calculate a few interim values from the specified population and probabilities, and them use them for our ultimate goal of finding the probability of being ill if testing positive.

All the values are then printed which gives an intuitive idea of the process, but this is a bit long-winded so in the next function we’ll do it the “correct” way using Bayes’ formula.

## The calculate_with_bayes Function

Firstly we need to calculate a couple more probabilities from those we already know: the probability of being healthy and the probability of testing positive if healthy. After doing this we can go ahead and implement Bayes’ Theorem.

The rest of the function is taken up with printing out the results, including the interim calculations.

## Let’s Run It

Now we can go ahead and run the program with this command:

`python3.8 bayes.py`

The output is

You might want to experiment with different sensitivities and specificities. The 99% ones I used are actually very high and many real world medical tests are much less accurate, which as you have probably realised means that the chances of a person having a disease if they test positive can be very low.

So does this mean that mass testing or screening of patients even if they have no symptoms is too inaccurate to be worthwhile? This is really a matter of opinion, but if you hear of or have personal experience of misdiagnoses then please bear in mind Thomas Bayes and his theorem.

## Explorations in Python

Explorations and experiments in Python

Written by

## Chris Webb

I am a content writer based in London, and I specialise in software development and related topics.

## Explorations in Python

Practical and useful projects and tutorials in the Python programming language

Written by

## Chris Webb

I am a content writer based in London, and I specialise in software development and related topics.

## Explorations in Python

Practical and useful projects and tutorials in the Python programming language

## 5 Things You Should Do When Using Terraform

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app