Posts tagged combinatorics

Photo URL is broken

Consider the game of Nim. The best way to become acquainted is to play Nim: The Game, which I've coded up here. Win some games!

Solving Nim

Nim falls under a special class of games, in which, we have several independent games (each pile is its own game), perfect information (you and your opponent know everything), and sequentiality (the game always ends). It turns out every game of this type is equivalent, and we can use Nim as a model on how to solve them.

The most general way to find the optimal strategy is exhaustive enumeration, which we can do recursively.

class GameLogic:

    def get_next_states(self, current_state):
        pass

    def is_losing_state(self, state):
        pass

def can_move_to_win(gameLogic, current_state):
    ## take care of base case
    if gameLogic.is_losing_state(current_state):
        return False
    ## try all adjacent states, these are our opponent's states
    next_states = gameLogic.get_next_states(current_state)
    for next_state in next_states:
        if can_move_to_win(gameLogic, next_state) is False:
            return True # one could possibly return the next move here
    ## otherwise, we always give our opponent a winning state 
    return False

Of course exhaustive enumeration quickly becomes infeasible. However, the fact that there exists this type of recursive algorithm informs us that we should be looking at induction to solve Nim.

One way to get some intuition at the solution is to think of heaps of just 1 object. No heaps mean that we've already lost, and 1 heap means that we'll win. 2 heaps must reduce to 1 heap, which puts our opponent in a winning state, so we lose. 3 heaps must reduce to 2 heaps, so we'll win. Essentially, we'll win if there are an odd number of heaps.

Now, if we think about representing the number of objects in a heap as a binary number, we can imagine each binary digit as its own game. For example, imagine heaps of sizes, $(27,16,8,2,7)$. Represented as binary, this looks like \begin{align*} 27 &= (11011)_2 \\ 16 &= (10000)_2 \\ 8 &= (01000)_2 \\ 2 &= (00010)_2 \\ 7 &= (00111)_2. \end{align*} The columns corresponding to the $2$s place and $4$s place have an odd number of $1$s, so we can put our opponent in a losing state by removing $6$ objects from the last heap. \begin{align*} 27 &= (11011)_2 \\ 16 &= (10000)_2 \\ 8 &= (01000)_2 \\ 2 &= (00010)_2 \\ 7 - 6 = 1 &= (00001)_2. \end{align*}

Now, the XOR of all the heaps is $0$, so there are an even number of $1$s in each column. As we can see, we have a general algorithm for doing this. Let $N_1,N_2,\ldots,N_K$ be the size of our heaps. If $S = N_1 \oplus N_2 \oplus \cdots \oplus N_K \neq 0$, then $S$ has a nonzero digit. Consider the leftmost nonzero digit. Since $2^{k+1} > \sum_{j=0}^k 2^j$, we can remove objects such that this digit changes, and we can chose any digits to the right to be whatever is necessary so that the XOR of the heap sizes is $0$. Here's a more formal proof.

Proof that if $N_1 \oplus N_2 \oplus \cdots \oplus N_K = 0$ we're in a losing state

Suppose there are $K$ heaps. Define $N_k^{(t)}$ to be the number of objects in heap $k$ at turn $t$. Define $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)}$.

We lose when $N_1 = N_2 = \cdots = N_k = 0$, so if $S^{(t)} \neq 0$, the game is still active, and we have not lost. By the above algorithm, we can make it so that $S^{(t+1)} = 0$ for our opponent. Then, any move that our opponent does must necessarily make it so that $S^{(t+2)} \neq 0$. In this manner, our opponent always has $S^{(t + 2s + 1}) = 0$ for all $s$. Thus, either they lose, or they give us a state such that $S^{(t+2s)} \neq 0$, so we never lose. Since the game must end, eventually our opponent loses.

Sprague-Grundy Theorem

Amazingly this same idea can be applied to a variety of games that meets certain conditions through the Sprague-Grundy Theorem. I actually don't quite under the Wikipedia article, but this is how I see it.

We give every indepedent game a nimber, meaning that it is equivalent to a heap of that size in Nim. Games that are over are assigned the nimber $0$. To find the nimber of a non-terminating game position, we look at the nimbers of all the game positions that we can move to. The nimber of this position is smallest nonnegative integer strictly greater than all the nimbers that we can move to. So if we can move to the nimbers $\{0,1,3\}$, the nimber is $2$.

At any point of the game, each of our $K$ independent games has a nimber $N_k$. We're in a losing state if and only if $S = N_1 \oplus N_2 \oplus \cdots \oplus N_K = 0$.

For the $\Leftarrow$ direction, first suppose that $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)} = 0$. Because of the way that the nimber's are defined we any move that we do changes the nimber of exactly one of the independent games, so $N_k^{(t)} \neq N_k^{(t + 1)}$, for the next game positions consist of nimbers $1,2,\ldots, N_k^{(t)} - 1$. If we can move to $N_k^{(t)}$, then actually the nimber of the current game position is $N_k^{(t)} + 1$, a contradiction. Thus, we have ensured that $S^{(t + 1)} \neq 0$. Since we can always move to smaller nimbers, we can think of nimbers as the number of objects in a heap. In this way, the opponent applies the same algorithm to ensure that $S^{(t + 2s)} = 0$, so we can never put the opponent in a terminating position. Since the game must end, we'll eventually be in the terminating position.

For the $\Rightarrow$ direction, suppose that we're in a losing state. We prove the contrapositive. If $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)} \neq 0$, the game has not terminated, and we can make it so that $S^{(t+1)} = 0$ by construction of the nimbers. Thus, our opponent is always in a losing state, and since the game must terminate, we'll win.

Now, this is all very abstract, so let's look at an actual problem.

Floor Division Game

Consider the Floor Division Game from CodeChef. This is the problem that motivated me to learn about all this. Let's take a look at the problem statement.

Henry and Derek are waiting on a room, eager to join the Snackdown 2016 Qualifier Round. They decide to pass the time by playing a game.

In this game's setup, they write $N$ positive integers on a blackboard. Then the players take turns, starting with Henry. In a turn, a player selects one of the integers, divides it by $2$, $3$, $4$, $5$ or $6$, and then takes the floor to make it an integer again. If the integer becomes $0$, it is erased from the board. The player who makes the last move wins.

Henry and Derek are very competitive, so aside from wanting to win Snackdown, they also want to win this game. Assuming they play with the optimal strategy, your task is to predict who wins the game.

The independent games aren't too hard to see. We have $N$ of them in the form of the positive integers that we're given. The numbers are written clearly on the blackboard, so we have perfect information. This game is sequential since the numbers must decrease in value.

A first pass naive solutions uses dynamic programming. Each game position is a nonnegative integer. $0$ is the terminating game position, so its nimber is $0$. Each game position can move to at most $5$ new game positions after dividing by $2$, $3$, $4$, $5$ or $6$ and taking the floor. So if we know the nimbers of these $5$ game positions, it's a simple matter to compute the nimber the current game position. Indeed, here's such a solution.

#include <iostream>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>

using namespace std;

long long computeNimber(long long a, unordered_map<long long, long long> &nimbers) {
  if (nimbers.count(a)) return nimbers[a];
  if (a == 0LL) return 0LL;
  unordered_set<long long> moves;
  for (int d = 2; d <= 6; ++d) moves.insert(a/d);
  unordered_set<long long> neighboringNimbers;
  for (long long nextA : moves) {
    neighboringNimbers.insert(computeNimber(nextA, nimbers));
  }
  long long nimber = 0;
  while (neighboringNimbers.count(nimber)) ++nimber;
  return nimbers[a] = nimber;
}

string findWinner(const vector<long long> &A, unordered_map<long long, long long> &nimbers) {
  long long cumulativeXor = 0;
  for (long long a : A) {
    cumulativeXor ^= computeNimber(a, nimbers);
  }
  return cumulativeXor == 0 ? "Derek" : "Henry";
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);  
  int T; cin >> T;
  unordered_map<long long, long long> nimbers;
  nimbers[0] = 0;
  for (int t = 0; t < T; ++t) {
    int N; cin >> N;
    vector<long long> A; A.reserve(N);
    for (int n = 0; n < N; ++n) {
      long long a; cin >> a;
      A.push_back(a);
    }
    cout << findWinner(A, nimbers) << '\n';
  }
  cout << flush;
  return 0;
}

Unfortunately, in a worst case scenario, we'll have to start computing nimbers from $0$ and to $\max\left(A_1,A_2,\ldots,A_N\right)$. Since $1 \leq A_n \leq 10^{18}$ can be very large, this solution breaks down. We can speed this up with some mathematical induction.

To me, the pattern was not obvious at all. To help discover it, I generated the nimbers for smaller numbers. This is what I found.

Game Positions Nimber Numbers in Interval
$[0,1)$ $0$ $1$
$[1,2)$ $1$ $1$
$[2,4)$ $2$ $2$
$[4,6)$ $3$ $2$
$[6,12)$ $0$ $6$
$[12,24)$ $1$ $12$
$[24,48)$ $2$ $24$
$[48,72)$ $3$ $24$
$[72,144)$ $0$ $72$
$[144,288)$ $1$ $144$
$[288,576)$ $2$ $288$
$[576, 864)$ $3$ $288$
$[864,1728)$ $0$ $864$
$[1728,3456)$ $1$ $1728$
$[3456,6912)$ $2$ $3456$
$[6912, 10368)$ $3$ $3456$
$\vdots$ $\vdots$ $\vdots$

Hopefully, you can start to see some sort of pattern here. We have repeating cycles of $0$s, $1$2, $2$s, and $3$s. Look at the starts of the cycles $0$, $6$, $72$, and $864$. The first cycle is an exception, but the following cycles all start at $6 \cdot 12^k$ for some nonnegative integer $k$.

Also look at the number of $0$s, $1$2, $2$s, and $3$s. Again, with the exception of the transition from the first cycle to the second cycle, the quantity is multiplied by 12. Let $s$ be the cycle start and $L$ be the cycle length. Then, $[s, s + L/11)$ has nimber $0$, $[s + L/11, s + 3L/11)$ has nimber $1$, $[s + 3L/11, s + 7L/11)$ has nimber $2$, and $[s + 7L/11, s + L)$ has nimber $3$.

Now, given that the penalty for wrong submissions on CodeChef is rather light, you could code this up and submit it now, but let's try to rigorously prove it. It's true when $s_0 = 6$ and $L_0 = 66$. So, we have taken care of the base case. Define $s_k = 12^ks_0$ and $L_k = 12^kL_0$. Notice that $L_k$ is always divisible by $11$ since $$L_{k} = s_{k+1} - s_{k} = 12^{k+1}s_0 - 12^{k}s_0 = 12^{k}s_0(12 - 1) = 11 \cdot 12^{k}s_0.$$

Our induction hypothesis is that this pattern holds in the interval $[s_k, s_k + L_k)$. Now, we attack the problem with case analysis.

  • Suppose that $x \in \left[s_{k+1}, s_k + \frac{1}{11}L_{k+1}\right)$. We have that $s_{k+1} = 12s_k$ and $L_{k+1} = 12L_k = 12\cdot 11s_k$, so we have that \begin{align*} 12s_k \leq &~~~x < 2 \cdot 12s_k = 2s_{k+1} \\ \left\lfloor\frac{12}{d}s_k\right\rfloor \leq &\left\lfloor\frac{x}{d}\right\rfloor < \left\lfloor\frac{2}{d}s_{k+1}\right\rfloor \\ 2s_k \leq &\left\lfloor\frac{x}{d}\right\rfloor < s_{k+1} \\ s_k + \frac{1}{11}L_k \leq &\left\lfloor\frac{x}{d}\right\rfloor < s_{k+1} \end{align*} since $2 \leq d \leq 6$, and $L_k = 11 \cdot 12^{k}s_0 = 11s_k$. Thus, $\left\lfloor\frac{x}{d}\right\rfloor$ falls entirely in the previous cycle, but it's large enough so it never has the nimber $0$. Thus, the smallest nonnegative integer among next game positions is $0$, so $x$ has the nimber $0$.
  • Now, suppose that $x \in \left[s_{k+1} + \frac{1}{11}L_{k+1}, s_{k+1} + \frac{3}{11}L_{k+1}\right)$. Then, we have that \begin{align*} s_{k+1} + \frac{1}{11}L_{k+1} \leq &~~~x < s_{k+1} + \frac{3}{11}L_{k+1} \\ 2s_{k+1} \leq &~~~x < 4s_{k+1} \\ 2 \cdot 12s_{k} \leq &~~~x < 4 \cdot 12s_{k} \\ \left\lfloor \frac{2 \cdot 12}{d}s_{k}\right\rfloor \leq &\left\lfloor \frac{x}{d} \right\rfloor < \left\lfloor\frac{4 \cdot 12}{d}s_{k}\right\rfloor \\ s_k + 3s_k = 4s_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < 24s_k = 12s_k + 12s_k = s_{k+1} + s_{k+1} \\ s_k + \frac{3}{11}L_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < s_{k+1} + \frac{1}{11}L_{k+1}. \end{align*} Thus, $\left\lfloor \frac{x}{d} \right\rfloor$ lies in both the previous cycle and current cycle. In the previous cycle it lies in the part with nimbers $2$ and $3$. In the current cycle, we're in the part with nimber $0$. In fact, if we let $d = 2$, we have that \begin{equation*} s_{k+1} \leq \left\lfloor \frac{x}{2} \right\rfloor < s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation*} so we can always reach a game position with nimber $0$. Since we can never reach a game position with nimber $1$, this game position has nimber $1$.
  • Suppose that $x \in \left[s_{k+1} + \frac{3}{11}L_{k+1}, s_{k+1} + \frac{7}{11}L_{k+1}\right)$. We follow the same ideas, here. \begin{align*} s_{k+1} + \frac{3}{11}L_{k+1} \leq &~~~x < s_{k+1} + \frac{7}{11}L_{k+1} \\ 4s_{k+1} \leq &~~~x < 8s_{k+1} \\ 8s_{k} \leq &\left\lfloor \frac{x}{d} \right\rfloor < 4s_{k+1} \\ s_{k} + \frac{7}{11}L_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < s_{k+1} + \frac{3}{11}L_{k+1}, \end{align*} so $\left\lfloor \frac{x}{d} \right\rfloor$ falls in the previous cycle where the nimber is $3$ and in the current cycle where the nimbers are $0$ and $1$. By fixing $d = 2$, we have that \begin{equation} s_{k+1} + \frac{1}{11}L_{k+1} = 2s_{k+1} \leq \left\lfloor \frac{x}{2} \right\rfloor < 4s_{k+1} = s_{k+1} + frac{3}{11}L_{k+1}, \end{equation} so we can always get to a number where the nimber is $1$. By fixing $d = 4$, we \begin{equation} s_{k+1} \leq \left\lfloor \frac{x}{4} \right\rfloor < 2s_{k+1} = s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation} so we can always get to a number where the nimber is $0$. Since a nimber of $2$ is impossible to reach, the current nimber is $2$.
  • Finally, the last case is $x \in \left[s_{k+1} + \frac{7}{11}L_{k+1}, s_{k+1} + L_{k+1}\right)$. This case is actually a little different. \begin{align*} s_{k+1} + \frac{7}{11}L_{k+1} \leq &~~~x < s_{k+1} + L_{k+1} \\ 8s_{k+1} \leq &~~~x < 12s_{k+1} \\ s_{k+1} < \left\lfloor \frac{4}{3}s_{k+1} \right\rfloor \leq &\left\lfloor \frac{x}{d} \right\rfloor < 6s_{k+1}, \end{align*} so we're entirely in the current cycle now. If we use, $d = 6$, then \begin{equation*} s_{k+1} < \left\lfloor \frac{4}{3}s_{k+1} \right\rfloor \leq \left\lfloor \frac{x}{6} \right\rfloor < 2s_{k+1} = s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation*} so we can reach nimber $0$. If we use $d = 4$, we have \begin{equation*} s_{k+1} + \frac{1}{11}L_{k+1} = 2s_{k+1} \leq \left\lfloor \frac{x}{4} \right\rfloor < 3s_{k+1} = s_{k+1} + \frac{2}{11}L_{k+1}, \end{equation*} which gives us a nimber of $1$. Finally, if we use $d = 2$, \begin{equation*} s_{k+1} + \frac{3}{11}L_{k+1} \leq 4s_{k+1} \leq\left\lfloor \frac{x}{4} \right\rfloor < 6s_{k+1} = s_{k+1} + \frac{5}{11}L_{k+1}, \end{equation*} so we can get a nimber of $2$, too. This is the largest nimber that we can get since $d \geq 2$, so $x$ must have nimber $3$.

This covers all the cases, so we're done. Here's the complete code for a $O\left(\max(A_k)\log N\right)$ solution.

#include <cmath>
#include <iostream>
#include <string>>
#include <vector>

using namespace std;

long long computeNimber(long long a) {
  if (a < 6) { // exceptional cases
    if (a < 1) return 0;
    if (a < 2) return 1;
    if (a < 4) return 2;
    return 3;
  }
  unsigned long long cycleStart = 6;
  while (12*cycleStart <= a) cycleStart *= 12;
  if (a < 2*cycleStart) return 0;
  if (a < 4*cycleStart) return 1;
  if (a < 8*cycleStart) return 2;
  return 3;
}

string findWinner(const vector<long long> &A) {
  long long cumulativeXor = 0;
  for (long long a : A) cumulativeXor ^= computeNimber(a);
  return cumulativeXor == 0 ? "Derek" : "Henry";
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);  
  int T; cin >> T;
  for (int t = 0; t < T; ++t) {
    int N; cin >> N;
    vector<long long> A; A.reserve(N);
    for (int n = 0; n < N; ++n) {
      long long a; cin >> a;
      A.push_back(a);
    }
    cout << findWinner(A) << '\n';
  }
  cout << flush;
  return 0;
}

In probability, math contests, and programming contests, we often need to count. Here, I'll write about a few cases that I see pretty often. Before we jump into things, recall the binomial coefficient and various ways of calculating it: $$ {n \choose k} = \frac{n!}{n!(n-k)!} = {n - 1 \choose k } + {n - 1 \choose k - 1 }, $$ where $n \geq k$ and is $0$ if $n < k.$ Thus, we compute the binomial coefficient with dynamic programming by using a triangular array: \begin{matrix} 1 \\ 1 & 1 \\ 1 & 2 & 1\\ 1 & 3 & 3 & 1\\ 1 & 4 & 6 & 4 & 1\\ \vdots & \vdots & \vdots & \ddots & \ddots & \ddots \end{matrix} where if we define $$ X_{n,k} = \begin{cases} {n \choose k}, &k \leq n \\ 0, &k > n, \end{cases} $$ then ${n \choose k} = X_{n,k} = X_{n-1,k} + X_{n-1,k-1}.$

We will see that we can count many things in this manner.

From $n$ objects with replacement

Some cases when we draw from $n$ objects with replacement are an infinite deck of cards, assigning types or categories, or drawing from $\{1,2,3,4,5,6\}$ by rolling a dice.

Ordered set of $k$ objects

If we are sampling $k$ objects from $n$ objects with replacement, each of the $k$ objects has $n$ possibilities. Thus, there are are $\boxed{n^k}$ possibilities.

For example, if $k$ distinct people are buying from a selection of $n$ ice cream flavors, there are $n^k$ different ways, this group of people can order. Another common situation is a sequence of $k$ bits. In this case $n = 2,$ so there are $2^k$ possible sequences.

Unordered set of $k$ objects

Another way to think of this is putting $k$ balls into $n$ bins, where the balls are not distinct. The method for solving this problem is also known as stars and bars.

Imagine each ball as a star, so we have $k$ of them. Along with the stars we have $n - 1$ bars. Now arrange these $(n-1) + k$ objects in any order. For example, take $$\star\star \mid \mid \star \mid \star\star\star.$$ This order would correspond to $2$ balls in the bin $1$, $0$ balls in bin $2$, $1$ ball in bin $3$, and $3$ balls in bin $4$. Thus, we have $(n-1 + k)!$ orderings if the objects were distinct. Since the $k$ balls and $n-1$ bars are identical, we divide by $(n-1)!$ and $k!$. Thus, the number of possible sets is $$ \frac{(n-1 + k)!}{(n-1)!k!} = \boxed{{n + k - 1\choose k}.} $$

From $n$ objects without replacement

This scenario common occurs where we have a finite collection of objects such as a deck of cards.

Unordered set of $k$ objects

We might see this situation when counting the number of $5$-card hands in poker for instance. The order that you draw the cards doesn't matter.

If we have $n$ cards, for the first card there are $n$ possibilities. For the next card, there are $n-1$ possibilities. Thus, if we draw $k$ cards, we have $n(n-1)\cdots(n-k+1) = n!/(n-k)!$ possible draws. Since the order doesn't matter, we divide by $k!.$ Thus, the count is $$\frac{n!}{(n-k)!k!} = \boxed{{n \choose k}.}$$

This makes the formula $${n \choose k} = {n-1 \choose k} + {n - 1 \choose k -1}$$ for computing the binomial coefficient intuitive. Imagine trying to choose $k$ objects from $n$ objects. We can either include or not include the $n$th object. If we don't include the $n$th object we choose $k$ objects from the first $n-1$ objects, which gives us the ${n-1 \choose k}$ term. If we do include the $n$th object then, we only choose $k-1$ objects from the first $n-1$ objects, which gives us the ${n-1 \choose k - 1}$ term.

Another common use of the of binomial coefficient is counting paths. Suppose we are on a grid, we can only make right and down moves, and we are trying to get from $A$ to $B$, where $B$ is $k$ moves to the right and $l$ moves downward. Our set of $n = k + l$ objects is $\{1, 2,\ldots,n\}.$ We choose $k$ indices to make a right move, and the rest of the moves will be down. Then, the number of paths is ${n \choose k}.$

Ordered set of $k$ objects

In this case, we care about the order of the cards we draw. From the discussion above, it's calculated the same way as an unordered set of $k$ objects except we don't divide by $k!$, so the number of possible draws is $$ n(n-1)\cdots(n-k+1) = \boxed{\frac{n!}{(n-k)!} = (n)_k,} $$ where the I have used the Pochhammer symbol to denote the falling factorial.

Recontres Numbers

These count the number of permutations with a certain number of fixed points. A permutation on $n$ elements is an element of the symmetric group $S_n$. For those without a background in algebra, it is essentially a way of rearranging $n$ objects. From our discussion of above, there are $n!$ ways to do so. For example, $$\sigma = \begin{pmatrix} 1 & 2 & 3 & 4\\ 3 & 2 & 4 & 1\\ \end{pmatrix}$$ reorders $(1,2,3,4)$ as $(3,2,4,1)$. $\sigma$ can be thought of as a function $\sigma(1) = 3$, $\sigma(2) = 2$, $\sigma(3) = 4$, and $\sigma(4) = 1$. Since for only $x = 2$ do we have $\sigma(x) = x$, there is only $1$ fixed point.

Now let $D_{n,k}$ be the number of permutations of $n$ objects with $k$ fixed points. Let $D_{0,0} = 1$. Clearly, $D_{1,0} = 0$ since $\sigma(1) = 1$ is the only possible permutation for $n = 1.$ Then, we have the recursive formula for $k > 0$ $$ D_{n,k} = {n \choose k}D_{n-k,0}, $$ which can be thought of as taking an unordered set of $k$ points to be fixed from $\{1,2,\ldots,n\}$, hence the ${n \choose k}$. For the remaining $n - k$ points, we have no fixed points because we want exactly $k$ fixed points.

So if we know $D_{0,0}, D_{1,0},\ldots,D_{n,0}$, we can calculate $D_{n+1,1}, D_{n+1,2},\ldots, D_{n+1,n+1}$. Then, for $k = 0$, since there are $(n+1)!$ total permutations, we have that $$ D_{n+1,0} = (n+1)! - \sum_{k=1}^{n+1}D_{n+1,k}. $$ There's a better way to calculate $D_{n,0}$, though, which I learned at CTY. These permutations with no fixed points are called derangements. Clearly, $D_{0,0} = 1$ and $D_{1,0} = 0$.

Now assume $n \geq 2$. Focus on element $n$. A permutation can be thought of as disjoint cycles. Recall the notation in abstract algebra, where we may write $$\sigma = \begin{pmatrix}1 & 2 & 3\end{pmatrix}\begin{pmatrix}4 & 5\end{pmatrix}\in S_5,$$ which gives us $2$ cycles, one of length $3$ and the other of length $2$. A cycle of length $1$ is a fixed point, so $n$ is part of a cycle of length $2$ or more than $2$. In the case that $n$ is part of a cycle of length $2$, there are $n-1$ options for the other element in the cycle. The number of ways to permute the remaining elements is $D_{n-2,0}$. In the case that $n$ is part of a cycle of length greater than $2$, we can consider permuting the first $n-1$ elements with no fixed points. For each such permutation, we have $n - 1$ elements after which we can insert element $n$, so it becomes part of an existing cycle. In this way, we have that $$ D_{n,0} = (n-1)\left(D_{n-1,0} + D_{n-2,0}\right). $$

Again, we have a triangular array \begin{matrix} 1 \\ 0 & 1 \\ 1 & 0 & 1\\ 2 & 3 & 0 & 1\\ 9 & 8 & 6 & 0 & 1\\ \vdots & \vdots & \vdots & \ddots & \ddots & \ddots \end{matrix}

One helpful way to visualize this process that I like is to imagine a dance class with $n$ couples. After each dance everyone has to find a new partner. There are $D_{n,k}$ ways that $k$ couples stay the same.

Bell Numbers

The Bell numbers count the ways to partition a set. Consider the set $S = \{1,2,3\}$. The possible nonempty subsets are $\{1\}$, $\{2\}$, $\{3\}$, $\{1,2\}$, $\{2,3\}$, $\{1,3\}$, and $\{1,2,3\}$. A partition would be a group of disjoint nonempty subsets such that each element of $S$ is an element of some subset in the partition. Thus, our partitions are $\left\{\{a\},\{b\},\{c\}\right\}$, $\left\{\{a\},\{b,c\}\right\}$, $\left\{\{a, c\},\{b\}\right\}$, $\left\{\{a, b\},\{c\}\right\}$, and $\left\{\{a, b,c\}\right\}$.

Let $B_n$ be number of ways to partition a set of $n$ objects. $B_0 = 1$, $B_1 = 1$, $B_2 = 2$ and $B_3 = 5$ for example. In general to calculate $B_{n+1}$, we have the recurrence relation $$ \boxed{B_{n+1} = \sum_{k=0}^n{n \choose k}B_{n-k} = \sum_{k=0}^n{n \choose k}B_{k}} $$ since ${n \choose k} = {n \choose n-k}$. To see this, consider partitions of the set $\{1,2,\ldots,n,n+1\}$. In the partition there is a subset that contains $n+1$, say $S$. We can have $|S| = k + 1 \in \{1,2,\ldots,n,n+1\}$. Clearly, $n+1 \in S$. Choosing the other $k$ elements of $S$ amounts to selecting an unordered set from $\{1,2,\ldots,n\}$, hence the ${n \choose k}$ factor in each term. For the remaining $n + 1 - (k+1) = n - k$ objects there are $B_{n-k}$ ways to partition them. Thus, we have the terms ${n \choose k}B_{n-k}$. We avoid double counting since the partitions corresponding to each term are disjoint because $n+1$ is in a subset of different size.

Catalan Numbers

Consider strings of $n$ $0$s and $n$ $1$s, so the string has length $2n$. Call this set $\Omega^{2n}$. Let $C_n$ be the number of such strings where no initial substring has more $1$s than $0$s. Formally, $$C_n = \left|\left\{ X = (x_1,x_2,\ldots,x_{2n}) \in \{0,1\}^{2n} : \sum_{i=1}^{2n} x_i = n, \sum_{i=1}^k x_i \leq \frac{k}{2}, \forall k \in \{1,2,\ldots,2n\}\right\}\right|.$$ $C_n$ is the $n$th Catalan number.

To calculate these numbers first note that $C_0 = 1$ and that every $X \in \Omega^{2(n+1)}$ for $n \geq 1$ can be written $$X = (0,X_1,1,X_2),$$ where $X_1$ and $X_2$ are elements of of $\Omega^{2k}$ and $\Omega^{2(n-k)}$, respectively for some $k \in \{0,1,\ldots,n\}$. Such a form is unique. To see this, let $X = (x_1,x_2,\ldots,x_{2n},x_{2n+1},x_{2n+2})$. Note that by the defintion, the first number in the string must be a $0$. Since the total numbers of $0$s and $1$s in the sequence must be equal, there exists an even index $j$ such that $\sum_{i=1}^j x_i = j/2$. Fix $j$ to be the smallest such index. We must have that $x_j = 1$ since otherwise the defintion would have been violated as $\sum_{i=1}^{j-1}x_i = j/2 > (j-1)/2$.

Then, we'll have $$X = (x_1 = 0,X_1,x_j = 1,X_2),$$ where $X_1$ is a string of length $2k = j-2$ and $X_2$ has length $2n + 2 - j = 2n-2k$. We show that $X_1 \in \Omega^{2k}$ and $X_2 \in \Omega^{2(n-k)}$. Since there are an equal number of $0$s and $1$s at index $j$, $X_1$ must have an equal number of $0$s and $1$s. If at any point $1 \leq l \leq 2k$, we have that $\sum_{i=2}^{l + 1}x_i > l/2$ then $\sum_{i=1}^{l+1}x_i \geq (l+1)/2$, which implies that $X \not\in \Omega^{2(n+1)}$ or that there is an index smaller than $j$ such that the initial substring has an equal number of $0$s and $1$s since $l+1 \leq j-1$. Both are a contradiction so we have $X_1 \in \Omega^{2k}$. Showing $X_2 \in \Omega^{2(n-k)}$ is similar. We have that $X_2$ must have an equal number of $0$s and $1$s in order for the whole string to have an equal number of $0$s and $1$s. If for any $1 \leq l \leq 2(n-k)$, we have that $\sum_{i=j+1}^{j+l}x_i > l/2$, then $$ \sum_{i=1}^{j+l}x_i = \sum_{i=1}^{j}x_i + \sum_{i=j+1}^{j+l}x_i = \frac{j}{2} + \sum_{i=j+1}^{j+l}x_i > \frac{j}{2} + \frac{l}{2} = \frac{j+l}{2}, $$ which implies that $X \not\in \Omega^{2(n+1)}$, which is a contradiction.

Thus, we have our desired result that $X = (x_1 = 0,X_1,x_j = 1,X_2)$, where $X_1 \in \Omega^{2k}$ and $X_2 \in \Omega^{2(n-k)}$, where $k \in \{0,1,\ldots,n\}$. Varying $k$, we come upon the recurrence relation $$\boxed{C_{n+1} = \sum_{k=0}^nC_kC_{n-k}.}$$ This is a pretty nice solution, but we can actually do better and find a closed-form solution.

Consider the generating function $$ c(x) = \sum_{n=0}^\infty C_nx^n = 1 + \sum_{n=1}^\infty C_nx^n = 1 + x\sum_{n=0}^\infty C_{n+1}x^n. $$ Substituting in the recurrence relation, for $C_{n+1}$, we have that \begin{align*} c(x) &= 1 + x\sum_{n=0}^\infty C_{n+1}x^n = 1 + x\sum_{n=0}^\infty \sum_{k=0}^nC_kC_{n-k}x^n \\ &= 1 + x\sum_{n=0}^\infty x^n\sum_{k=0}^nC_kC_{n-k} = 1 + x\left[\sum_{n=0}^\infty C_n x^n\right]^2 \\ &= 1 + x[c(x)]^2. \end{align*} Solving for $c(x)$ with the quadratic formula, we find that $$ c(x) = \frac{1 \pm \sqrt{1-4x}}{2x} = \frac{2}{1\mp\sqrt{1-4x}}. $$ Since $c(0) = 1$, $$\displaystyle c(x) = \frac{1 - \sqrt{1-4x}}{2x} = \frac{1}{2x}\left(1 - \sqrt{1-4x}\right).$$

Consider the Taylor series of $f(y) = \sqrt{1+y}.$ By induction, $$f^{(n)}(y) = (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n}(1+y)^{-(2n-1)/2} \Rightarrow f^{(n)}(0) = (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n}.$$ Moreover, \begin{align*} f^{(n)}(0) &= (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n} = (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n}\cdot \frac{2^n n!(2n-1)}{2^n n!(2n-1)} \\ &= \frac{(-1)^{n+1}}{4^n(2n-1)}\frac{(2n)!}{n!}. \end{align*}

Thus, we have that $$ f(y) = \sqrt{1+y} = 1 + \sum_{n=1}^\infty \frac{(-1)^{n+1}}{4^n(2n-1)}\frac{(2n)!}{n!n!}y^n, $$ so we have \begin{align*} c(x) &= \frac{1}{2x}(1 + f(-4x)) = \frac{1}{2x}\left(\sum_{n=1}^\infty \frac{(-1)^{n}}{4^n(2n-1)}\frac{(2n)!}{n!n!}(-4x)^n\right) \\ &= \frac{1}{2x}\left(\sum_{n=1}^\infty \frac{(-1)^{2n}}{(2n-1)}\frac{(2n)!}{n!n!}x^n\right) = \sum_{n=1}^\infty \frac{1}{2n(2n-1)}\frac{(2n)!}{(n-1)!n!}x^{n-1} \\ &= \sum_{n=1}^\infty \frac{1}{n}\frac{(2n-2)!}{(n-1)!(n-1)!}x^{n-1} = \sum_{n=1}^\infty \frac{1}{n}{2(n-1) \choose n-1 }x^{n-1} \\ &= \sum_{n=0}^\infty \frac{1}{n+1}{2n \choose n}x^{n} = \sum_{n=0}^\infty C_nx^{n}, \end{align*} so $\displaystyle \boxed{C_n = \frac{1}{n+1}{2n \choose n}.}$


Photo URL is broken

One of the more interesting problems that I've come across recently is to calculate the distribution of the last time a simple random walk is at $0.$

Let $X_1,X_2,\ldots,X_{2n}$ be independent, indentically distributed random variables such that $P(X_i = 1) = P(X_i = -1) = 1/2.$ Define $S_k = \sum_{i=1}^k X_i.$ Then, we have a path $$(0,0) \rightarrow (1,S_1) \rightarrow (2,S_2) \rightarrow \cdots \rightarrow (2n,S_{2n}).$$ Define the random variable $L_{2n} = \sup\{ k \leq 2n : S_k = 0\}.$ We want the distribution of $L_{2n}.$

Note that we have that \begin{equation} \mathbb{P}(S_{2n} = 0) = 2^{-2n}{2n \choose n} \end{equation} since we have $n$ positive steps and $n$ negative steps.

Let $\displaystyle N_{n,x} ={n \choose (n+x)/2}$ denote the number of paths from $(0,0)$ to $(n,x)$ since $(n+x)/2$ positive steps implies there are $(n-x)/2$ negative steps, and $(n+x)/2 - (n-x)/2 = x$. Note that $n + x$ must be even for this to be well defined. If $n + x$ is not even, then $N_{n,x} = 0$ since $x$ must have the same parity as $n.$ First, we prove the reflection principle.

Reflection Principle

If $x,y > 0,$ the number of paths that from $(0,x)$ to $(n,y)$ that are $0$ at some time, that is, they touch the $x$-axis is equal to the total number of paths from $(0,-x)$ to $(n,y),$ which is $N_{n,y+x}.$ Therefore, the number of paths from $(0,x)$ to $(n,y)$ that do not touch $0$ is $N_{n,|y-x|} - N_{n,y+x}.$

We can establish a one-to-one correspondence between the set $A$, the paths from $(0,x)$ to $(n,y)$ that are $0$ at some time and the set $B$ the paths from $(0,-x)$ to $(n,y)$.

Consider any path $P$ in $A$. $P$ must include the point $(m,0),$ where $0 < m < n$. Fix $m$ to be the greatest such integer. We construct a path $Q_1$ from $(0,-x)$ to $(m,0)$ by going in the opposite direction as $P.$ We construct a path $Q_2$ from $(m,0)$ to $(n,y)$ by mirroring $P$. Thus, we have that $Q = Q_1 \cup Q_2 \in B.$

Now consider any path $Q$ in $B$. Since paths are continuous, $Q$ must cross the $x$-axis, so $Q$ includes a point $(m,0)$, where $0 < m < n.$ Fix $m$ to be the greatest such integer. We construct $P_1,$ a path from $(0,x)$ to $(m,0)$ by going in the opposite direction as $Q$. We construct $P_2$ by mirroring $Q$. Thus, we have that $P = P_1 \cup P_2 \in A.$

So, we have established a one-to-one correspondence, and therefore, we have proven $|A| = |B|.$

Symmetry of Zeroes

$\mathbb{P}(S_1 \neq 0, S_2 \neq 0,\ldots,S_{2n} \neq 0) = \mathbb{P}(S_{2n} = 0).$

First note that $$ \mathbb{P}(S_1 \neq 0,\ldots,S_{2n} \neq 0) = \mathbb{P}(S_1 > 0,\ldots,S_{2n} > 0) + \mathbb{P}(S_1 < 0,\ldots,S_{2n} < 0) $$ since we can never have the path touch $0.$ Also note that the two terms are equal, so \begin{equation} \mathbb{P}(S_1 \neq 0, S_2 \neq 0,\ldots,S_{2n} \neq 0) = 2\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0). \end{equation} Now, note that \begin{equation} \mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0) = \sum_{r=1}^{n}\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} = 2r) \end{equation} since we have taken an even number of steps.

To calculate $\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} = 2r),$ we note that $X_1 = 1$ since $S_1 > 0$. Then, the number of paths from $(1,1)$ to $(2n,2r)$ that do not touch $0$ by the Reflection Principle is $N_{2n-1,2r-1} - N_{2n-1,2r+1}$. Thus, \begin{align*} \mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} = 2r) &= \left(\frac{1}{2}\right)\left(\frac{1}{2^{2n-1}}\right)\left(N_{2n-1,2r-1} - N_{2n-1,2r+1}\right) \\ &= \left(\frac{1}{2^{2n}}\right)\left(N_{2n-1,2r-1} - N_{2n-1,2r+1}\right). \end{align*}

So, we have that \begin{align*} \mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0) &= \left(\frac{1}{2^{2n}}\right) \sum_{r=1}^n\left(N_{2n-1,2r-1} - N_{2n-1,2r+1}\right) \\ &= \frac{1}{2^{2n}}N_{2n-1,1} = \frac{1}{2^{2n}}{2n - 1 \choose n}\\ &= \frac{1}{2^{2n}}\frac{(2n-1)!}{n!(n-1)!} = \frac{1}{2}\frac{1}{2^{2n}}\frac{(2n)!}{n!n!} \\ &= \frac{1}{2}\mathbb{P}(S_{2n} = 0) \end{align*} by telescoping since $N_{2n-1,2n+1} = 0$ and substituting in the previous equations.

Recalling the earlier equation, we have the desired result, $$ \mathbb{P}(S_1 \neq 0, S_2 \neq 0,\ldots,S_{2n} \neq 0) = 2\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0) = \mathbb{P}(S_{2n} = 0). $$

Distribution of Last Zero Visit

Let $L_{2n}$ be the random variable whose value is the last time the simple random walk visited $0.$ Formally, $$L_{2n} = \sup\left\{2k : k \in \{0,\ldots,n\},~\sum_{i=1}^{2k} X_i = 0\right\}.$$ Then, we have that $$\mathbb{P}(L_{2n} = 2k) = 2^{-2n}{2k \choose k}{2n-2k \choose n-k}.$$

To see this, we have that \begin{align*} \mathbb{P}(L_{2n} = 2k) &= \mathbb{P}(S_{2k} = 0, S_{2k+1} \neq 0, \ldots,S_{2n} \neq 0) \\ &= \mathbb{P}(S_{2k} = 0, S_{2k+1} - S_{2k} \neq 0, \ldots,S_{2n} - S_{2k} \neq 0) \\ &= \mathbb{P}(S_{2k} = 0)\mathbb{P}(S_1 \neq 0, \ldots,S_{2n-2k} \neq 0) \\ &= \mathbb{P}(S_{2k} = 0)\mathbb{P}(S_{2n-2k} = 0), \end{align*} where the last equaility is by Symmetry of Zeroes. Thus, we have that $$\mathbb{P}(L_{2n} = 2k) = \mathbb{P}(S_{2k} = 0)\mathbb{P}(S_{2n-2k} = 0) =2^{-2n}{2k \choose k}{2n-2k \choose n-k}$$ as desired.

Conculusion

Let's look at what this distribution looks like when $n = 15.$ We have symmetric distribution about $15$, so the mean is $15.$ However, the distribution is U-shaped, and the most likely values are very far from the mean. Thus, to take an analogy from sports, if two evenly matched teams were to played against each other multiple times over the course of a season, the most likely scenario is for one team team to lead the other team the entire season. So, saying that team A led team B the entire season is an almost meaningless statistic.