Graphically understanding the Inclusion-Exclusion Principle

9 min readOct 19, 2022

--

The Inclusion-Exclusion Principle is an important counting technique in combinatorics and a useful principle in statistics. In statistics, it can be formulated as

A statistical formulation of the Inclusion-Exclusion Principle.

But what does it mean? Can we graphically represent it?

Note: for simplicity, we’re assuming here that all events are independent from each other

1. Union of two events

To understand this principle, let us start by visualizing and calculating the probability of the union of two (independent) non-incompatible events, namely A and B. We can visualize them as two overlapping sets in the event space Ω (take a look to this story to know what non-incompatible means and what the event space Ω is).

Two non-incompatible events in the event space.

Knowing the P(A) probability of A and the P(B) probability of B, we want to calculate the probability of their union

Union of two events.

for example, the probability of “being born in Italy” (event A) OR “living in France” (event B). These two events can also happen together, so they’re not incompatible: you could be born in Italy and do not live in France, live in France but not be born in Italy, or be born in Italy and live in France. And we assume they also are independent: we say that being born in Italy does not condition living in France, and vice versa, even if in real life it could, because the probability of living where you were born is higher than living elsewhere (take a look to this story for the Bayes Theorem and the conditional probability). Graphically, we want to calculate the area of this highlighted region of the event space

Graphical representation of the union of two non-incompatible events.

What we can do? First of all, we can sum up the two known probabilities, that are represented by the circles A and B: P(A) + P(B). But… if we take both circles, we’re counting twice their intersection P(A⋂B), because it “belongs” both to A and B.

Thus, we need to subtract it once to get the desired probability and we know that the probability of the intersection of two independent events (that is, the probability that they happen together, if one does not condition the other) is simply the product of their probabilities. So we can say that

The Inclusion-Exclusion Principle for two non-incompatible events.

We are including the probabilities of the single events and excluding the probability of their intersection.

2. Union of three events

Let now A, B, and C be three independent non-incompatible events. We can represent them as three overlapping circles in the event space Ω.

Three non-incompatible events in the event space.

for example, the probability of “being born in Italy” (event A) OR “living in France” (event B) OR “having a driving license” (event C). As before, these events are not incompatible and we assume they are independent: they can happen alone or together, none of them is mutually exclusive with respect to the others, and none of them can condition the others.

Knowing the P(A) probability of A, the P(B) probability of B, and the P(C) probability of C, we want to calculate the probability of their union

Union of three events.

Graphically, we want to calculate the area of the highlighted region

Graphical representation of the union of three non-incompatible events.

As we did before, we can start by summing up the three known probabilities, i.e. P(A) + P(B) + P(C), represented by the three circles in the event space. But, if we simply sum up all of them, we’re taking more than once each region given by their intersections…!

The intersections between events A, B and C.

Thus, we need to subtract them once, as we did before for the union of three events, but… are we done? Is that enough? Things are now getting a little bit complicated, there are many intersections, and an intersection we didn’t yet considered, A⋂B⋂C: what does it happen to it when we subtract those two-by-two intersections?

We need an advice. So, let us make a table of all sets and intersections

And let’s start by adding the first set A, counting how many times we considered each set and intersection.

Note that, adding A we’re counting once its intersections also because they indeed “belong” to A. Let us now add the second set B

Now, adding B, we’re counting once their intersections too, thus we need to sum up all intersections we’ve counted twice (AB and ABC). Let us finally add the third set C

Good. Now we know that we’ve taken twice all the intersections of two events (AB, AC, a BC) and thrice the intersection of all three events together (ABC). We then need to subtract them. Let’s start by subtracting AB

Note that, subtracting AB we’re also subtracting once ABC too, because it “belongs” to AB, that is ABC is contained in AB, ABC⊂AB! Let’s subtract AC now

In this case too, we’re subtracting ABC also because it “belongs” to AC too. Finally, let’s subtract BC and see what happens

Wait, what…!? Subtracting all the intersections between two events (AB, AC, and BC) we now “have got a hole”: ABC is missing!

The “hole” caused by the exclusion of the intersections two-by-two.

We then need to add it back one time to get the entire desired area

We’re finally done: we took one time only the entire desired area. And, as before, knowing that the probability of the intersection of two or more (independent) events is the product of their probabilities, we can write that

The Inclusion-Exclusion Principle for three independent and non-incompatible events.

We are including the probabilities of the single events (A, B, and C), excluding the probability of their intersections two-by-two (AB, AC, and BC), and including their intersection three-by-three (ABC).

3. Union of four events

We could go further and calculate the probability of the union of four independent non-incompatible events (for example, the probability of “being born in Italy” OR “living in France” OR “having a driving license” OR “being a male”). But we’ve now understood what would happen and what we’d have to do: include, exclude, include, exclude, include, etc… So I’ll simply leave here the graphical representation and the table of add (include) and subtract (exclude) operations

Graphical representation and calculation of the union of four non-incompatible events.

4. Union of n events

Suppose now we’ve got n non-incompatible and independent events, that is

Hypothesis: n non-incompatible and independent events.

Knowing the P(Eᵢ) probability of each Eᵢ events, we want to calculate the probability of their union

First of all, we can sum up the probabilities of all Eᵢ event, but then we have to start the “excluding, including” chain. How do we proceed? We need to include single events, exclude all intersections two-by-two, include all intersections three-by-three, then exclude all intersections four-by-four, etc… till we reach the last intersection n-by-n.

These intersections are combinations without repetition of the n events, taken k-by-k, with k = [1, 2, 3, … n]. Whaaaaaaat…!?

We need to do a bit of combinatorics. In combinatorics we have dispositions and combinations, and they could be with or without repetition.

First, let us define the binomial coefficient

that gives the number of n elements taken k-by-k, and where the operator x! is the factorial of x, i.e.

Factorial of x.

Let us now consider some examples to understand what we’re talking about.

Suppose we have got three cities: Rome, Paris, and London. We want to visit two of them only once. We could to visit: {(Rome, Paris), (Paris, Rome), (Rome, London), (London, Rome), (Paris, London), (London, Paris)}. We have got 6 possibilities. As we can see i) the order matters because, for example, we can visit first Rome then Paris, etc… ii) we’re not visiting each city more than once. These are called dispositions without repetition: “dispositions” are chosen when the order matters, and “without repetition” means that each element must not be taken more than once. How many dispositions without repetition are possible with three elements taken two-by-two? Combinatorics tells us that they are

Dispositions without repetition of 3 elements taken 2-by-2.

Generalizing, if we’ve got n elements taken k-by-k, the number of dispositions without repetition are

Dispositions without repetition of n elements taken k-by-k.

Let us now suppose we’ve already visited, at least once, two of the three cities. We could have visited: {(Rome, Paris), (Rome, London), (Paris, London)}. We have got three possibilities. Note that, in this case, i) order doesn’t matter, and ii) we’re not interested in how many times we did visit each city. These are called combinations without repetition: “combinations” are chosen when order doesn’t matter, and “without repetition” means that each element must not be taken more than once. How many combinations without repetition have you got for three elements taken two-by-two? Combinatorics says

Combinations without repetition of 3 elements taken 2-by-2.

Generalizing, we can say that the number of combinations without repetition of n elements taken k-by-k are

These are exactly what we need because, when we’re including-excluding the intersections between the events in the event space, so the order doesn’t matter because P(A∩B) = P(B∩A), and we do not want to take each element more than once (because A∩A = A itself and we’ve already taken it). But… given a set E of n elements, we can say that each element Eᵢ is the combination without repetition of n elements taken one-by-one…

Combinations without repetition of n elements taken one-by-one. Note that n! = 1•2•3…(n-1)•n = (n-1)!•n.

Though, we do not need to count combinations without repetition but to generate all of them. Given a set E of n elements we can say that the combinations without repetition taken k-by-k are

Generation of the combinations without repetition of the elements of E taken k-by-k.

In our example, E = {Rome, Paris, London} and then

Generation of the combinations without repetition of the three cities taken 2-by-2.

So, getting back to our inclusion-exclusion problem, we need to generate all combinations without repetition of the n elements of E in our event space, multiply the elements in each generated subset S and finally sum them up

Sum of products of combinations without repetition of the elements of E taken i-by-i.

This way, we generate all S combinations without repetition of E taken i-by-i, multiply the elements in each S subset and sum up the products.

Now, we have to include-exclude for i that goes from 1 (the elements themselves, as we saw earlier) till n (the intersection of all the elements). We thus need a summation with alternate signs. A way to achieve it is simply to raise (-1) to a power: when the exponent is zero or even we got (+1) so we sum (include), when the exponent is odd we got (-1) and we subtract (exclude).

Summation with alternate signs, starting from (+1), i.e from inclusion.

Putting it all together, we finally get to desired generalized formula of the Inclusion-Exclusion Principle

Graphically understanding the Inclusion-Exclusion Principle

1. Union of two events

2. Union of three events

3. Union of four events

4. Union of n events

Written by Massimo Pierini