The Inclusion-Exclusion Principle¶
The inclusion-exclusion principle is an important combinatorial way to compute the size of a set or the probability of complex events. It relates the sizes of individual sets with their union.
Statement¶
The verbal formula¶
The inclusion-exclusion principle can be expressed as follows:
To compute the size of a union of multiple sets, it is necessary to sum the sizes of these sets separately, and then subtract the sizes of all pairwise intersections of the sets, then add back the size of the intersections of triples of the sets, subtract the size of quadruples of the sets, and so on, up to the intersection of all sets.
The formulation in terms of sets¶
The above definition can be expressed mathematically as follows:
And in a more compact way:
The formulation using Venn diagrams¶
Let the diagram show three sets
Then the area of their union
It can also be generalized for an association of
The formulation in terms of probability theory¶
If
And in a more compact way:
Proof¶
For the proof it is convenient to use the mathematical formulation in terms of set theory:
We want to prove that any element contained in at least one of the sets
Consider an element
- in terms which
- in terms which
- in terms which
- in terms which
- in terms which
This leads us to the following sum of binomial coefficients:
This expression is very similar to the binomial expansion of
When
Generalization for calculating number of elements in exactly sets¶
Inclusion-exclusion principle can be rewritten to calculate number of elements which are present in zero sets:
Consider its generalization to calculate number of elements which are present in exactly
To prove this formula, consider some particular
The sets on the left side do not intersect for different
Usage when solving problems¶
The inclusion-exclusion principle is hard to understand without studying its applications.
First, we will look at three simplest tasks "at paper", illustrating applications of the principle, and then consider more practical problems which are difficult to solve without inclusion-exclusion principle.
Tasks asking to "find the number of ways" are worth of note, as they sometimes lead to polynomial solutions, not necessarily exponential.
A simple task on permutations¶
Task: count how many permutations of numbers from
Let's count the number of "bad" permutations, that is, permutations in which the first element is
We will denote by
After a simple combinatorial calculation, we will get to:
The only thing left is to subtract this number from the total of
A simple task on (0, 1, 2) sequences¶
Task: count how many sequences of length
Again let us turn to the inverse problem, i.e. we calculate the number of sequences which do not contain at least one of the numbers.
Let's denote by
- The size of each
- The size of each pairwise intersection
- The size of the intersection of all three sets is equal to
As we solved the inverse problem, we subtract it from the total of
Number of upper-bound integer sums¶
Consider the following equation:
where
Task: count the number of solutions to the equation.
Forget the restriction on
We will now calculate the number of "bad" solutions with the inclusion-exclusion principle. The "bad" solutions will be those in which one or more
Denote by
Similarly, the size of the intersection between two sets
The size of each intersection of three sets is zero, since
Combining all this into the formula of inclusions-exceptions and given that we solved the inverse problem, we finally get the answer:
This easily generalizes to
As above, we treat binomial coefficients with negative upper index as zero.
Note this problem could also be solved with dynamic programming or generating functions. The inclusion-exclusion answer is computed in
The number of relative primes in a given interval¶
Task: given two numbers
Let's solve the inverse problem - compute the number of not mutually primes with
We will denote the prime factors of
How many numbers in the interval
However, if we simply sum these numbers, some numbers will be summarized several times (those that share multiple
We will iterate over all
Here is a C++ implementation:
int solve (int n, int r) {
vector<int> p;
for (int i=2; i*i<=n; ++i)
if (n % i == 0) {
p.push_back (i);
while (n % i == 0)
n /= i;
}
if (n > 1)
p.push_back (n);
int sum = 0;
for (int msk=1; msk<(1<<p.size()); ++msk) {
int mult = 1,
bits = 0;
for (int i=0; i<(int)p.size(); ++i)
if (msk & (1<<i)) {
++bits;
mult *= p[i];
}
int cur = r / mult;
if (bits % 2 == 1)
sum += cur;
else
sum -= cur;
}
return r - sum;
}
Asymptotics of the solution is
The number of integers in a given interval which are multiple of at least one of the given numbers¶
Given
The solution algorithm is almost identical to the one for previous task — construct the formula of inclusion-exclusion on the numbers
So we will now iterate over all
The number of strings that satisfy a given pattern¶
Consider
Notice first that we can easily count the number of strings that satisfy at once all of the specified patterns. To do this, simply "cross" patterns: iterate though the positions ("slots") and look at a position over all patterns. If all patterns have a question mark in this position, the character can be any letter from
Learn now to solve the first version of the problem: when the string must satisfy exactly
To solve it, iterate and fix a specific subset
Where
(If you have a hard time figuring out this, you can try drawing Venn Diagrams.)
If we sum up on all
However, asymptotics of this solution is
We will reverse the formula of inclusion-exclusion and sum in terms of
Now our solution has asymptotics
We will now solve the second version of the problem: find the number of strings that match at least
Of course, we can just use the solution to the first version of the problem and add the answers for sets with size greater than
Looking at Graham's (Graham, Knuth, Patashnik. "Concrete mathematics" [1998] ), we see a well-known formula for binomial coefficients:
Applying it here, we find that the entire sum of binomial coefficients is minimized:
Thus, for this task, we also obtained a solution with the asymptotics
The number of ways of going from a cell to another¶
There is a field
Assume that the sizes
For now, sort the obstacles by their coordinate
Also just learn how to solve a problem without obstacles: i.e. learn how to count the number of ways to get from one cell to another. In one axis, we need to go through
Now to count the number of ways to get from one cell to another, avoiding all obstacles, you can use inclusion-exclusion to solve the inverse problem: count the number of ways to walk through the board stepping at a subset of obstacles (and subtract it from the total number of ways).
When iterating over a subset of the obstacles that we'll step, to count the number of ways to do this simply multiply the number of all paths from starting cell to the first of the selected obstacles, a first obstacle to the second, and so on, and then add or subtract this number from the answer, in accordance with the standard formula of inclusion-exclusion.
However, this will again be non-polynomial in complexity
Here goes a polynomial solution:
We will use dynamic programming. For convenience, push (1,1) to the beginning and (n,m) at the end of the obstacles array. Let's compute the numbers
Let's forget for a second the obstacles and just count the number of paths from cell
When considering an obstacle
We can compute
The number of coprime quadruples¶
You're given
We will solve the inverse problem — compute the number of "bad" quadruples, i.e. quadruples in which all numbers are divisible by a number
We will use the inclusion-exclusion principle while summing over all possible groups of four numbers divisible by a divisor
where
To calculate the function
Thus, using the formula of inclusions-exclusions we sum the number of groups of four divisible by a prime number, then subtract the number of quadruples which are divisible by the product of two primes, add quadruples divisible by three primes, etc.
The number of harmonic triplets¶
You are given a number
- or
- or
First, go straight to the inverse problem — i.e. count the number of non-harmonic triples.
Second, note that any non-harmonic triplet is made of a pair of coprimes and a third number that is not coprime with at least one from the pair.
Thus, the number of non-harmonic triples that contain
Either
or
In both of these cases, it will be counted twice. The first case will be counted when
Now all we have left to solve is to learn to count the number of coprimes to
A faster solution is possible with such modification of the sieve of Eratosthenes:
-
First, we find all numbers in the interval
- To do this we will maintain an array
- During the sieve of Eratosthenes, we will iterate
- To do this we will maintain an array
-
Second, we need to calculate the answer for all
- To do this, remember how the formula of inclusion-exclusion works — actually here we implement the same concept, but with inverted logic: we iterate over a component (a product of primes from the factorization) and add or subtract its term on the formula of inclusion-exclusion of each of its multiples.
- So, let's say we are processing a number
Here's a C++ implementation:
int n;
bool good[MAXN];
int deg[MAXN], cnt[MAXN];
long long solve() {
memset (good, 1, sizeof good);
memset (deg, 0, sizeof deg);
memset (cnt, 0, sizeof cnt);
long long ans_bad = 0;
for (int i=2; i<=n; ++i) {
if (good[i]) {
if (deg[i] == 0) deg[i] = 1;
for (int j=1; i*j<=n; ++j) {
if (j > 1 && deg[i] == 1)
if (j % i == 0)
good[i*j] = false;
else
++deg[i*j];
cnt[i*j] += (n / i) * (deg[i]%2==1 ? +1 : -1);
}
}
ans_bad += (cnt[i] - 1) * 1ll * (n-1 - cnt[i]);
}
return (n-1) * 1ll * (n-2) * (n-3) / 6 - ans_bad / 2;
}
The asymptotics of our solution is
The number of permutations without fixed points (derangements)¶
Prove that the number of permutations of length
and approximately equal to:
(if you round this expression to the nearest whole number — you get exactly the number of permutations without fixed points)
Denote by
We now use the formula of inclusion-exclusion to count the number of permutations with at least one fixed point. For this we need to learn to count sizes of an intersection of sets
because if we know that the number of fixed points is equal
Substituting this into the formula of inclusion-exclusion, and given that the number of ways to choose a subset of size
Then the number of permutations without fixed points is equal to:
Simplifying this expression, we obtain exact and approximate expressions for the number of permutations without fixed points:
(because the sum in brackets are the first
It is worth noting that a similar problem can be solved this way: when you need the fixed points were not among the
Practice Problems¶
A list of tasks that can be solved using the principle of inclusions-exclusions:
- UVA #10325 "The Lottery" [difficulty: low]
- UVA #11806 "Cheerleaders" [difficulty: low]
- TopCoder SRM 477 "CarelessSecretary" [difficulty: low]
- TopCoder TCHS 16 "Divisibility" [difficulty: low]
- SPOJ #6285 NGM2 , "Another Game With Numbers" [difficulty: low]
- TopCoder SRM 382 "CharmingTicketsEasy" [difficulty: medium]
- TopCoder SRM 390 "SetOfPatterns" [difficulty: medium]
- TopCoder SRM 176 "Deranged" [difficulty: medium]
- TopCoder SRM 457 "TheHexagonsDivOne" [difficulty: medium]
- SPOJ #4191 MSKYCODE "Sky Code" [difficulty: medium]
- SPOJ #4168 SQFREE "Square-free integers" [difficulty: medium]
- CodeChef "Count Relations" [difficulty: medium]
- SPOJ - Almost Prime Numbers Again
- SPOJ - Find number of Pair of Friends
- SPOJ - Balanced Cow Subsets
- SPOJ - EASY MATH [difficulty: medium]
- SPOJ - MOMOS - FEASTOFPIGS [difficulty: easy]
- Atcoder - Grid 2 [difficulty: easy]
- Codeforces - Count GCD