Graphically understanding the Inclusion-Exclusion Principle
The Inclusion-Exclusion Principle is an important counting technique in combinatorics and a useful principle in statistics. In statistics, it can be formulated as
But what does it mean? Can we graphically represent it?
Note: for simplicity, we’re assuming here that all events are independent from each other
1. Union of two events
To understand this principle, let us start by visualizing and calculating the probability of the union of two (independent) non-incompatible events, namely A and B. We can visualize them as two overlapping sets in the event space Ω (take a look to this story to know what non-incompatible means and what the event space Ω is).
Knowing the P(A) probability of A and the P(B) probability of B, we want to calculate the probability of their union
for example, the probability of “being born in Italy” (event A) OR “living in France” (event B). These two events can also happen together, so they’re not incompatible: you could be born in Italy and do not live in France, live in France but not be born in Italy, or be born in Italy and live in France. And we assume they also are independent: we say that being born in Italy does not condition living in France, and vice versa, even if in real life it could, because the probability of living where you were born is higher than living elsewhere (take a look to this story for the Bayes Theorem and the conditional probability). Graphically, we want to calculate the area of this highlighted region of the event space
What we can do? First of all, we can sum up the two known probabilities, that are represented by the circles A and B: P(A) + P(B). But… if we take both circles, we’re counting twice their intersection P(A⋂B), because it “belongs” both to A and B.
Thus, we need to subtract it once to get the desired probability and we know that the probability of the intersection of two independent events (that is, the probability that they happen together, if one does not condition the other) is simply the product of their probabilities. So we can say that
We are including the probabilities of the single events and excluding the probability of their intersection.
2. Union of three events
Let now A, B, and C be three independent non-incompatible events. We can represent them as three overlapping circles in the event space Ω.
for example, the probability of “being born in Italy” (event A) OR “living in France” (event B) OR “having a driving license” (event C). As before, these events are not incompatible and we assume they are independent: they can happen alone or together, none of them is mutually exclusive with respect to the others, and none of them can condition the others.
Knowing the P(A) probability of A, the P(B) probability of B, and the P(C) probability of C, we want to calculate the probability of their union
Graphically, we want to calculate the area of the highlighted region
As we did before, we can start by summing up the three known probabilities, i.e. P(A) + P(B) + P(C), represented by the three circles in the event space. But, if we simply sum up all of them, we’re taking more than once each region given by their intersections…!
Thus, we need to subtract them once, as we did before for the union of three events, but… are we done? Is that enough? Things are now getting a little bit complicated, there are many intersections, and an intersection we didn’t yet considered, A⋂B⋂C: what does it happen to it when we subtract those two-by-two intersections?
We need an advice. So, let us make a table of all sets and intersections
And let’s start by adding the first set A, counting how many times we considered each set and intersection.
Note that, adding A we’re counting once its intersections also because they indeed “belong” to A. Let us now add the second set B
Now, adding B, we’re counting once their intersections too, thus we need to sum up all intersections we’ve counted twice (AB and ABC). Let us finally add the third set C
Good. Now we know that we’ve taken twice all the intersections of two events (AB, AC, a BC) and thrice the intersection of all three events together (ABC). We then need to subtract them. Let’s start by subtracting AB
Note that, subtracting AB we’re also subtracting once ABC too, because it “belongs” to AB, that is ABC is contained in AB, ABC⊂AB! Let’s subtract AC now
In this case too, we’re subtracting ABC also because it “belongs” to AC too. Finally, let’s subtract BC and see what happens
Wait, what…!? Subtracting all the intersections between two events (AB, AC, and BC) we now “have got a hole”: ABC is missing!
We then need to add it back one time to get the entire desired area
We’re finally done: we took one time only the entire desired area. And, as before, knowing that the probability of the intersection of two or more (independent) events is the product of their probabilities, we can write that
We are including the probabilities of the single events (A, B, and C), excluding the probability of their intersections two-by-two (AB, AC, and BC), and including their intersection three-by-three (ABC).
3. Union of four events
We could go further and calculate the probability of the union of four independent non-incompatible events (for example, the probability of “being born in Italy” OR “living in France” OR “having a driving license” OR “being a male”). But we’ve now understood what would happen and what we’d have to do: include, exclude, include, exclude, include, etc… So I’ll simply leave here the graphical representation and the table of add (include) and subtract (exclude) operations
4. Union of n events
Suppose now we’ve got n non-incompatible and independent events, that is
Knowing the P(Eᵢ) probability of each Eᵢ events, we want to calculate the probability of their union
First of all, we can sum up the probabilities of all Eᵢ event, but then we have to start the “excluding, including” chain. How do we proceed? We need to include single events, exclude all intersections two-by-two, include all intersections three-by-three, then exclude all intersections four-by-four, etc… till we reach the last intersection n-by-n.
These intersections are combinations without repetition of the n events, taken k-by-k, with k = [1, 2, 3, … n]. Whaaaaaaat…!?
We need to do a bit of combinatorics. In combinatorics we have dispositions and combinations, and they could be with or without repetition.
First, let us define the binomial coefficient
that gives the number of n elements taken k-by-k, and where the operator x! is the factorial of x, i.e.
Let us now consider some examples to understand what we’re talking about.
Suppose we have got three cities: Rome, Paris, and London. We want to visit two of them only once. We could to visit: {(Rome, Paris), (Paris, Rome), (Rome, London), (London, Rome), (Paris, London), (London, Paris)}. We have got 6 possibilities. As we can see i) the order matters because, for example, we can visit first Rome then Paris, etc… ii) we’re not visiting each city more than once. These are called dispositions without repetition: “dispositions” are chosen when the order matters, and “without repetition” means that each element must not be taken more than once. How many dispositions without repetition are possible with three elements taken two-by-two? Combinatorics tells us that they are
Generalizing, if we’ve got n elements taken k-by-k, the number of dispositions without repetition are
Let us now suppose we’ve already visited, at least once, two of the three cities. We could have visited: {(Rome, Paris), (Rome, London), (Paris, London)}. We have got three possibilities. Note that, in this case, i) order doesn’t matter, and ii) we’re not interested in how many times we did visit each city. These are called combinations without repetition: “combinations” are chosen when order doesn’t matter, and “without repetition” means that each element must not be taken more than once. How many combinations without repetition have you got for three elements taken two-by-two? Combinatorics says
Generalizing, we can say that the number of combinations without repetition of n elements taken k-by-k are
These are exactly what we need because, when we’re including-excluding the intersections between the events in the event space, so the order doesn’t matter because P(A∩B) = P(B∩A), and we do not want to take each element more than once (because A∩A = A itself and we’ve already taken it). But… given a set E of n elements, we can say that each element Eᵢ is the combination without repetition of n elements taken one-by-one…
Though, we do not need to count combinations without repetition but to generate all of them. Given a set E of n elements we can say that the combinations without repetition taken k-by-k are
In our example, E = {Rome, Paris, London} and then
So, getting back to our inclusion-exclusion problem, we need to generate all combinations without repetition of the n elements of E in our event space, multiply the elements in each generated subset S and finally sum them up
This way, we generate all S combinations without repetition of E taken i-by-i, multiply the elements in each S subset and sum up the products.
Now, we have to include-exclude for i that goes from 1 (the elements themselves, as we saw earlier) till n (the intersection of all the elements). We thus need a summation with alternate signs. A way to achieve it is simply to raise (-1) to a power: when the exponent is zero or even we got (+1) so we sum (include), when the exponent is odd we got (-1) and we subtract (exclude).
Putting it all together, we finally get to desired generalized formula of the Inclusion-Exclusion Principle