Jared Corduan

The (finite) pigeonhole principle states that if you have more labels than objects (with everything finite) and you want to assign all the labels to the objects, at least one of the objects will have multiple labels. Or, using the imagery of pigeons, if you have more pigeons than holes, you cannot place all the pigeons into the holes without at least one hole having multiple pigeons. It is an obvious statement which turns out to be surprisingly useful. Shockingly useful! The statement is attributed to Dirichlet, who actually used a metaphor involving letters (the kind you mail).

There are many great examples of proofs that involve applying the pigeonhole principle. The Mutilated chessboard problem is one of my favorites (hint: each domino covers exactly one black and one white square), but perhaps my favorite is Fermat’s theorem on sums of two squares.

Proof of Fermat’s theorem on sums of two squares

The theorem states: for any odd prime number \(p\), \(p\) can be written as the sum of two squares exactly when it is congruent to one modulo four. In symbols:

\[ p=x^{2}+y^{2} \quad\text{if and only if}\quad p \equiv 1 \pmod 4 \]

Before we get to the fun part of the proof, we must narrow things down. The “only if” direction of the proof is straightforward: since \(p\) is an odd prime, it cannot be congruent to 0 or 2 modulo 4, and 3 can be ruled out by a case analysis on all the sums of squares modulo 4.

The “if” direction is where the fun begins. We make one simplification, using Euler’s criterion, before bringing out the pigeons. Assuming that \(p \equiv 1 \pmod 4 \), Euler’s criterion tells us that there is an \(x\) such that \(x^{2}+1 \equiv 0 \pmod p \). The curious reader can look up the proof of Euler’s criterion, it is fairly short, but we’ll skip it so that we can jump straight to naming pigeons.

We declare one pigeon for each pair of natural numbers \(u, v\) such that \(0 \leq u,v < \sqrt p\). How many pigeons do we have? We have \((\lfloor \sqrt p \rfloor + 1)^{2}\) pigeons, which is definitely greater than \(p\). Foreshadowing!

The holes will be the numbers modulo \(p\), and we will place pigeon \(u, v\) into the hole \(ux - v \pmod p\). The pigeonhole principle tells us that there are numbers \(u_1, u_2, v_1, v_2\) such that \[u_1 x - v_1 \equiv u_2 x - v_2 \pmod p\] A little bit of algebra yields: \[(u_1 - u_2)x \equiv (v_1 - v_2) \pmod p\] \[(u_1 - u_2)^{2}x^{2} \equiv (v_1 - v_2)^2 \pmod p\] \[(u_1 - u_2)^{2}(-1) \equiv (v_1 - v_2)^2 \pmod p\] \[0 \equiv (u_1 - u_2)^{2} + (v_1 - v_2)^2 \pmod p\]

Note that either \((u_1 - u_2)\) or \((v_1 - v_2)\) must be positive since the numbers came from two distinct pigeons.

Therefore \[0 < (u_1 - u_2)^2 + (v_1 - v_2)^2 < (\sqrt p)^2 + (\sqrt p)^2 = 2p\] We have then that \((u_1 - u_2)^2 + (v_1 - v_2)^2\) is a multiple of \(p\) strictly between \(0\) and \(2p\), leaving only \(p\). This completes the proof.

A clue for the usefulness of the pigeonhole principle

There is a strong connection between the pigeonhole principle and mathematical induction. People have studied weak proof systems and have found that adding principles of induction to these weak systems has the same effect as adding the pigeonhole principle as a new axiom. The connection goes quite deep, but requires building up some terminology around the syntax of formulas. The book Metamathematics of First-Order Arithmetic by Hájek and Pudlák explains this in detail (see Chapter 1, Section 2, part b).

The most intuitive explanation that I have found of this connection comes from the Hájek and Pudlák book. A failure of induction translates very easily into a failure of the pigeonhole principle.

A failure of induction, for a formula \(\phi (x)\), would look like this:

\(\phi (0)\) is true
for every \(x\), if \(\phi (x)\) is true, then so is \(\phi (x+1)\)
\(\phi (a)\) is false for some \(a\)

We can translate this into a failure of the pigeonhole principle as follows:

Our pigeons are the numbers \(\leq a\). The holes are the numbers \(< a\). We place pigeons in holes as follows:

if \(\phi (x)\) holds, put pigeon \(x\) into hole \(x\)
if \(\phi (x)\) does not hold, put pigeon \(x\) into hole \(x - 1\)

This is not constructive intuition, and I would love a more constructive view of the connection between induction and the pigeonhole principle.

It also turns out that induction is comparable to some bounding principles. If we view a formula \(\phi (x, y)\) as a partial, multi-valued function, thinking of \(f (x) = y\) exactly when \(\phi (x, y)\), then a bounding principle is a way of stating that if the domain of a function is bounded, then so is its range.

The infinite pigeonhole principle

There is also an infinite version of the pigeon hole principle, which states that you cannot assign an infinite number labels to a finite collection of objects without labeling at least one object with an infinite number of labels. The contrapositive states that a finite union of finite sets is finite.

According to Akihiro Kanamori, in The Mathematical Infinite as a Matter of Method, the pigeon hole principle may have had a role in making the definition of “infinite” rigorous:

In 1872 Dedekind was putting together Was sind und was sollen die Zahlen?, and he would be the first to define infinite set, with the definition being a set for which there is a one-to-one correspondence with a proper subset. This is just the negation of the Pigeonhole Principle. Dedekind in effect had inverted a negative aspect of finite cardinality into a positive existence definition of the infinite.

The connection between the pigeonhole principle, induction, and bounding principles holds in the infinite case as well. See Jeff Hirst’s thesis, theorem 6.4 on page 104.

The gateway to Ramsey theory

Ramsey’s theorem is a generalization of the pigeonhole principle. It has both a finite and an infinite version, which itself has a host of generalizations and leads to a whole field of study.

One special case of Ramsey’s theorem state that any group of six people must contain three people such than one of the following holds:

all three people know each other
none of them know each other

We can re-phrase this in terms of graph theory: if you color the edges of the complete graph on six vertices with two colors, say red and blue, there is a sub-graph of size three such that either all the edges are red or all the edges are blue. The size three sub-graph is called homogeneous or monochromatic.

The finite version of Ramsey’s theorem generalizes the above statement to any number of colors, any size homogeneous set, and multi-dimensional graphs. The size graph that you need to guarantee a homogeneous set of the given size grows extremely fast.

I’ll end this post with a fun quote from Paul Erdős:

Suppose aliens invade the earth and threaten to obliterate it in a year’s time unless human beings can find the Ramsey number for red five and blue five. We could marshal the world’s best minds and fastest computers, and within a year we could probably calculate the value. If the aliens demanded the Ramsey number for red six and blue six, however, we would have no choice but to launch a preemptive attack.