Posts tagged algorithm

Photo URL is broken

Think of two sets that can only be matched together. For example, for those that believe in traditional marriage you can think of $n$ females and $n$ males. Suppose that we know much happiness each marriage would bring, and assume marriage only brings postive happiness. How should we match couples to maximize happiness?

In another scenario, suppose we have $n$ items and $n$ bidders, and we know how much each bidder is willing to pay for each item. Also, assume each bidder can only bid on at most $1$ item. To whom should we sell each item to maximize our profit?

We can model this scenario with a complete bipartite graph $G = (X,Y,E)$, where $X$ and $Y$ are two disjoint sets of vertices such that every vertex in $X$ is connected to every in $Y$, but within $X$ and within $Y$ no vertices are connected. That is, $(x,y) \in E \Leftrightarrow x \in X \wedge y \in Y$. Each edge has a weight $w_{xy}.$ In our two examples, $w_{xy}$ is the happiness from the marriage of female $x$ and male $y$ or how much item $x$ is worth to bidder $y$.

We want to match or assign each element of $X$ to a distinct element of $Y$. Thus, we have the so-called Assignment Problem. Here, I'll detail how the Hungarian method solves this.

First, we need to introduce some definitions for the algorithm to make sense.

Graph Theory

Matching

The technical definition of a matching is a set $M$ of non-adjacent edges, that is, no two edges in the set share a common vertex. However, I always prefered to think of matching as pairs of vertices connected by an edge such that each only vertex appears in at most 1 pair. Here's an example, where I have colored the matching in red:

x1 x2 x3 x4 y1 y2 y3 y4

In this case, $M = \{(x_1,y_1),(x_3,y_4),(x_4,y_2)\}.$

Perfect Matching

A matching is maximal if adding any other edge makes it not a matching. Another way to think about this is in terms of sets. The maximal matching cannot be a proper subset of any other matching.

Now, matching is perfect if it contains all vertices. For instance,

x1 x2 x3 x4 y1 y2 y3 y4

is a perfect matching, where $M = \{(x_1,y_3),(x_2,y_1),(x_3,y_4),(x_4,y_2)\}.$

Alternating Path

Now, one way of representing matchings is to break them up into paths. An alternating path is composed of edges, where every other edge is part of a matching. For example, this is an alternating path, where I have colored edges in the path red and blue, and red edges are part of the matching.

x1 x2 x3 x4 y1 y2 y3 y4

A path is just an ordered set of edges, where two adjacent edges share a common vertex. Our path is $x_1 \rightarrow y_2 \rightarrow x_4 \rightarrow y_1 \rightarrow x_2,$ and the corresponding matching is $M = \left\{(x_4,y_2), (x_2,y_1)\right\}.$

Augmenting Path

An augmenting path is a particular kind of alternating path, in which the first and last vertex in the path is unmatched. Thus, by flipping the edges in the matching, we will get a matching that contains an additional vertex. Formally, consider the augmenting path $P$. Clearly, it must contain an odd number of edges. If $M$ is the corresponding matching, then it contains the even numbered edges in the path $P$. We can make a new larger matching $P-M$ that contains the odd numbered edges. Here's an example of this process.

x1 x2 x3 x4 y1 y2 y3 y4 x1 x2 x3 x4 y1 y2 y3 y4

Labeling

Let us suppose the graph looks like this, where I only draw the edges from one vertex in $X$ at a time for clarity.

10 3 2 9 x1 x2 x3 x4 y1 y2 y3 y4 10 2 2 3 x1 x2 x3 x4 y1 y2 y3 y4 4 10 3 3 x1 x2 x3 x4 y1 y2 y3 y4 2 2 10 0 x1 x2 x3 x4 y1 y2 y3 y4

We assign each vertex a nonnegative integer label. Formally, this labeling is a function, $L: X \cup Y \rightarrow \mathbb{Z}.$ Let $w(x,y)$ be the weight of the edge from vertex $x$ and $y.$ An an edge $(x,y)$ is considered feasible is $L(x) + L(y) \geq w(x,y).$ A labeling is feasible if $L(x) + L(y) \geq w(x,y)$ for all $x$ and $y$.

Moreover, we define an equality subgraph, $G_L = (X,Y,E_L)$, associated with a labeling as follows: $(x,y) \in E_L \Leftrightarrow L(x) + L(y) = w(x,y).$

For example, given a specific labeling, here's the equality subgraph, where I've colored the feasible edges in black and the edges in the equality subgraph in green.

8 1 2 0 1 1 0 1 3 2 9 2 3 0 8 1 2 0 1 1 0 1

To me, it's not at all obvious why a labeling would be helpful, but this theorem connects everything.

If a perfect matching exists in an equality subgraph of a feasible labeling $L$, then it is a maximum weighted matching.

Here's a proof. Let $M$ be the perfect matching in the equality subgraph of labeling $L$. Let $M^\prime$ be an alternative perfect matching. Then, we have that the value of the matching $M^\prime$ is \begin{align*} \sum_{(x,y) \in M^\prime} w(x,y) &\leq \sum_{(x,y) \in M^\prime} L(x) + L(y)~~~\text{since $L$ is feasible}\\ &= \sum_{(x,y) \in M} L(x) + L(y) \\ &= \sum_{(x,y) \in M} w(x,y), \end{align*} which is the value of the matching $M.$

Thus, our problem reduces to find a perfect matching in the equality subgraph of a feasible labeling.

Hungarian Method

Now, we know enough terminology to describe the algorithm. The main idea is to iteratively find augmenting paths in the equality subgraph until all vertices in $X$ are matched. If such a path cannot be created, we need to modify the labeling so that the labeling contains additional edges but remains feasible.

Algorithm

  1. Initialize with a feasible labeling.
  2. Initialize two sets to keep track of vertices in alternating path. $S \subseteq X$ and $T \subseteq Y.$ Initialize with $S = T = \emptyset.$
  3. Pick an unmatched vertex $x \in X.$ Put it in a set called $S.$
  4. There are two options here:

    1. If there is a vertex $y \in Y - T \cap N(S),$ where $N(S)$ denotes the neighbors of the vertices in $S$ in the equality subgraph, then put $y$ in $T.$ This means that we add $y$ to our path. For the $x \in S$ such that $L(x) + L(y) = w(x,y)$, set $x$ as the previous vertex in the current alternating path.
    2. Otherwise, fix the labeling. Let $\Delta_y = \inf\{L(x) + L(y) - w(x, y) : x \in S\}.$ Let $\Delta = \inf\{\Delta_y : y \in Y - T\}.$ For each $x \in S,$ $L^\prime(x) = L(x) - \Delta.$ For each $y \in T,$ $L^\prime(y) = L(y) + \Delta.$ In this way, all edges remain feasible. Consider various $e = (x,y)$:
    • If $x \not\in S$ and $y \not\in T$, $e$ is still feasible since we didn't touch the labels of $x$ and $y$. If this edge is part of the equality subgraph, and these vertices are matched, they still are.
    • If $x \in S$ and $y \in T$, we decreased $L(x)$ by $\Delta$ and increased $L(y)$ by $\Delta$, so $L(x) + L(y) = w(x,y)$ still.
    • If $x \in S$ and $y \not\in T$, $e$ is still feasible since we have decreased $L(x)$ by the minimum of all such $\Delta_y.$
    • If $x \not\in S$ and $y \in T$, we only increased $L(y)$, so the edge is more feasible in a sense.

      Now, if $\Delta_y = \Delta$ for $y \in Y - T,$ then $y$ becomes and element of $N(S),$ and we can return to step 4.

  5. Now, we have just put $y \in T$. We have two cases here:
    1. $y$ is umatched. Add $y$ to end end of the alternating path. Our alternating path starts with some umatched vertex in $x \in S$ and ends with unmatched $y.$ We have an augmenting path, so create a new bigger matching by inverting our path. Count the number of matched vertices. If it is $n$, we are done. Otherwise, go back to step 2 and repeat.
    2. $y$ is already matched, say, to $x^\prime$. We add two vertices to our path, so our path looks like $$x \longrightarrow \cdots \longrightarrow y \longrightarrow x^\prime,$$ where $x \in S$ is an umatched vertex. Go to step 4 to find more vertices to add to our path.

Implementation

The algorithm is simple enough, but there are few complications in implementing it, particularly, finding an efficient way to calculate $\Delta$. If you read carefully, you'll note that if $|Y| \geq |X|,$ we will still find the optimal match since we will use the edges of higher value first. Let $|X| = n$ and $|Y| = m.$ Let us define various data structures:

  • vector<bool> S(n, false): keeps track of the $X$ vertices in our current alternating path
  • vector<int> xLabel(n, -1): labeling of vertices in $X$
  • vector<int> yLabel(m, 0): labeling of vertices in $Y$
  • vector<int> slack(m, INT_MAX): slack[y] $= \Delta_y$
  • vector<int> prevX(n, -1): prevX[x] is the vertex in $Y$ that comes before $x$ in our alternating path
  • vector<int> prevY(m, -1): prevY[y] is the vertex in $X$ that comes before $y$ in our alternating path
  • vector<int> xToY(n, -1): $x$ is matched to xToY[x] in $Y$
  • vector<int> yToX(m, -1): $y$ is matched to yToX[y] in $X$
  • stack<int, vector<int>> sStack keeps track of the vertices that need to be added to $S$. A queue would work just as well.
  • queue<int> tQueue keeps track of the vertices that are eligible to be added to $T$ (neighbors of $S$ in equality subgraph that aren't in $T$ already). A stack would work just as well.

Our input will be const vector<vector<int>> &weights, which a $n \times m$ matrix of edge weights. We will return xToY when all vertices in $X$ are matched.

Let's go over the implementation step-by-step:

  1. We initialize all the data structures. To create an initial feasible labeling, we set $L(y) = 0$ for all $y \in Y$ and set $L(x) = \max_{y \in Y}w(x,y).$ This is an $O(NM)$ operation. We set the number of matches to $0$ and also push an umatched vertex into sStack. Here's the code:

     vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
       int N = weights.size();
       if (N == 0) return vector<int>();
       int M = weights.back().size();
       if (N > M) throw logic_error("|X| > |Y|, no match is possible");
    
       vector<bool> S(N, false);  // set to keep track of vertices in X on the left in alternating path
       vector<int> xLabel(N, -1);
       vector<int> yLabel(M, 0);
       vector<int> slack(M, INT_MAX); // keep track of how far Y is from being matched with a vertex in S
       for (int i = 0; i < N; ++i) {  // initialize label with max edge to Y
         for (int j = 0; j < M; ++j) xLabel[i] = max(xLabel[i], weights[i][j]);
       }
       // array for memorizing alternating path
       vector<int> prevX(N, -1); // prevX[i] is vertex on the right (in Y) that comes before i in X
       vector<int> prevY(M, -1); // prevY[j] is vertex on the left (in X) that comes before j in Y if slack[j] == 0; otherwise, closest vertex in S
       // maps to keep track of assignment
       vector<int> xToY(N, -1); // xToY[i] is vertex on the right (in Y) matched to i in X
       vector<int> yToX(M, -1); // yToX[j] is vertex on the left (in X) matched to j in Y
    
       stack<int, vector<int>> sStack; // vertices to add to S
       queue<int> tQueue; // neighbors of S to add to T
       int matches = 0;
       sStack.push(0);               // initialize with unmatched vertex
       while (matches < N) {
         ...
       }
       return xToY;
     }
    
  2. Right now $S$ and $T$ are empty. The first order of business is to add something to $S$. When we add to $S$, we initialize slack, find the closest vertex in $S$ to each $y \in Y$, and add neighbors in the equality subgraph to tQueue. By closest vertex, I mean the vertex that would require changing the labeling the least. Here's the code.

     vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
       ...
       while (matches < N) {
         while (!sStack.empty()) {    // add unmatched vertices to S
           int x = sStack.top(); sStack.pop();
           S[x] = true;
           for (int j = 0; j < M; ++j) {
             if (xLabel[x] + yLabel[j] - weights[x][j] < slack[j]) { // check for neighboring vertices and initialize slack
               slack[j] = xLabel[x] + yLabel[j] - weights[x][j];     // slack >= 0, all feasible initially, and we decrease by min
               prevY[j] = x;         // tree looks like ... --> x -?-> j depending if slack[j] == 0 or not
               if (slack[j] == 0) tQueue.push(j); // edge is in equality subgraph, it is a neighbor of S
             }
           }
         }
         ...
       }
       return xToY;
     }
    
  3. Now, we have two options depending if tQueue is empty or not. If tQueue is empty, we fix the labeling. We'll restart the while loop with an empty sStack and at least 1 vertex in tQueue after this.

     vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
       ...
       while (matches < N) {
         ...
         if (tQueue.empty()) {     // no neighboring vertices, fix labeling
           // loop invariant is that |S| > |T|, since we add to S whenever we add pop from tQueue
           int delta = INT_MAX;
           for (int j = 0; j < M; ++j) {
             if (slack[j] > 0) delta = min(delta, slack[j]); // only try to add edges that are feasible and not in T
           }
           for (int i = 0; i < N; ++i) {
             if (S[i]) xLabel[i] -= delta; // decrease label of vertices in S
           }
           for (int j = 0; j < M; ++j) {
             if (slack[j] == 0) {   // it's in T
               yLabel[j] += delta;
             } else if (slack[j] > 0 && prevY[j] != -1) { // check that it's feasible and connected to S
               slack[j] -= delta;                         // decrease the distance from S since labels in S were decreased
               if (slack[j] == 0) tQueue.push(j);
             }
           }
         } else {
           ...
         }
       }
       return xToY;
     }
    
  4. Now, we have ensured tQueue won't be empty. Again, we have two options here depending if the vertex at the head of the queue is matched or not. Let us first deal with chase where it is already matched, so we extend our alternating path with $\cdots \rightarrow y \rightarrow x^\prime.$ Then, we restart our while loop with $x^\prime$ in sStack since $x^\prime$ is part of the path now, too.

     vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
       ...
       while (matches < N) {
         ...
         if (tQueue.empty()) {     // no neighboring vertices, fix labeling
           ...
         } else {                    // either augment path or vertex is already matched so add to S
           int y = tQueue.front(); tQueue.pop();
           int x = yToX[y];
           if (x == -1) {
             ...    
           } else { // vertex was already matched, new path is [something umatched in S] --> ... --> prevY[y] --> y --> x
             prevX[x] = y; // update alternating path with edge between x and y, recall prevY[y] is already set
             sStack.push(x);        // add this already matched vertex to S
           }
         }
       }
       return xToY;
     }
    
  5. On the other hand, if the head of the queue $y$ is unmatched we have found an augmenting path to invert to create a new matching. After establishing this new matching, discard our alternating path, clear $S$ and $T$, update slcak, and count the total matches. If the everything in $X$ is matched, we're done. Otherwise, put something in sStack, so we can begin building our new alternating path.

     vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
       ...
       while (matches < N) {
         ...
         if (tQueue.empty()) {     // no neighboring vertices, fix labeling
           ...
         } else {                    // either augment path or vertex is already matched so add to S
           int y = tQueue.front(); tQueue.pop();
           int x = yToX[y];
           if (x == -1) {
             int currentY = y;
             while (currentY > -1) { // new path is [something unmatched in S] --> ... --> y
               int currentX = prevY[currentY]; // go to left side
               xToY[currentX] = currentY; yToX[currentY] = currentX;
               currentY = prevX[currentX]; // go back to right side
             }
             for (int i = 0; i < N; ++i) prevX[i] = -1, S[i] = false; // reset path and remove everything from tree
             for (int j = 0; j < M; ++j) prevY[j] = -1, slack[j] = INT_MAX; // reset path and slack
             while (!tQueue.empty()) tQueue.pop(); // empty queue
             // check for a perfect match
             matches = 0;
             for (int i = 0; i < N; ++i) {
               if (xToY[i] != -1) {  // if matched
                 ++matches;
               } else if (sStack.empty()) {
                 sStack.push(i); // put an unmatched left side node back in S to start
               }
             }
           } else { // vertex was already matched, new path is [something umatched in S] --> ... --> prevY[y] --> y --> x
             ...
           }
         }
       }
       return xToY;
     }
    

All in all, the algorithm is $O(n^2m)$ since every time we build an augmenting path we match a vertex in $X$, of which there are $n$. We enter the main loop as often as $n$ times since our path can be upto length $2n$. There are several steps in the loop where we iterate over $Y$, which has size $m$, such as computing the slack or fixing labels. Here's the whole function together:

/* hungarian method for maximum weighted matching of a bipartite graph
 * Consider a weighted bipartite graph G = (X,Y). X is vertices on left side, Y is vertices on the right side
 * We must have |X| <= |Y|. Match each vertex on the left to the a distinct vertex on the right with maximum total weight of edges
 */
vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
  int N = weights.size();
  if (N == 0) return vector<int>();
  int M = weights.back().size();
  if (N > M) throw logic_error("|X| > |Y|, no match is possible");

  vector<bool> S(N, false);  // set to keep track of vertices in X on the left in alternating path
  vector<int> xLabel(N, -1);
  vector<int> yLabel(M, 0);
  vector<int> slack(M, INT_MAX); // keep track of how far Y is from being matched with a vertex in S
  for (int i = 0; i < N; ++i) {  // initialize label with max edge to Y
    for (int j = 0; j < M; ++j) xLabel[i] = max(xLabel[i], weights[i][j]);
  }
  // array for memorizing alternating path
  vector<int> prevX(N, -1); // prevX[i] is vertex on the right (in Y) that comes before i in X
  vector<int> prevY(M, -1); // prevY[j] is vertex on the left (in X) that comes before j in Y if slack[j] == 0; otherwise, closest vertex in S
  // maps to keep track of assignment
  vector<int> xToY(N, -1); // xToY[i] is vertex on the right (in Y) matched to i in X
  vector<int> yToX(M, -1); // yToX[j] is vertex on the left (in X) matched to j in Y

  stack<int, vector<int>> sStack; // vertices to add to S
  queue<int> tQueue; // neighbors of S to add to T
  int matches = 0;
  sStack.push(0);               // initialize with unmatched vertex
  while (matches < N) {
    while (!sStack.empty()) {    // add unmatched vertices to S
      int x = sStack.top(); sStack.pop();
      S[x] = true;
      for (int j = 0; j < M; ++j) {
        if (xLabel[x] + yLabel[j] - weights[x][j] < slack[j]) { // check for neighboring vertices and initialize slack
          slack[j] = xLabel[x] + yLabel[j] - weights[x][j];     // slack >= 0, all feasible initially, and we decrease by min
          prevY[j] = x;         // tree looks like ... --> x -?-> j depending if slack[j] == 0 or not
          if (slack[j] == 0) tQueue.push(j); // edge is in equality subgraph, it is a neighbor of S
        }
      }
    }
    if (tQueue.empty()) {     // no neighboring vertices, fix labeling
      // loop invariant is that |S| > |T|, since we add to S whenever we add pop from tQueue
      int delta = INT_MAX;
      for (int j = 0; j < M; ++j) {
        if (slack[j] > 0) delta = min(delta, slack[j]); // only try to add edges that are feasible and not in T
      }
      for (int i = 0; i < N; ++i) {
        if (S[i]) xLabel[i] -= delta; // decrease label of vertices in S
      }
      for (int j = 0; j < M; ++j) {
        if (slack[j] == 0) {   // it's in T
          yLabel[j] += delta;
        } else if (slack[j] > 0 && prevY[j] != -1) { // check that it's feasible and connected to S
          slack[j] -= delta;                         // decrease the distance from S since labels in S were decreased
          if (slack[j] == 0) tQueue.push(j);
        }
      }
    } else {                    // either augment path or vertex is already matched so add to S
      int y = tQueue.front(); tQueue.pop();
      int x = yToX[y];
      if (x == -1) {
        int currentY = y;
        while (currentY > -1) { // new path is [something unmatched in S] --> ... --> y
          int currentX = prevY[currentY]; // go to left side
          xToY[currentX] = currentY; yToX[currentY] = currentX;
          currentY = prevX[currentX]; // go back to right side
        }
        for (int i = 0; i < N; ++i) prevX[i] = -1, S[i] = false; // reset path and remove everything from tree
        for (int j = 0; j < M; ++j) prevY[j] = -1, slack[j] = INT_MAX; // reset path and slack
        while (!tQueue.empty()) tQueue.pop(); // empty queue
        // check for a perfect match
        matches = 0;
        for (int i = 0; i < N; ++i) {
          if (xToY[i] != -1) {  // if matched
            ++matches;
          } else if (sStack.empty()) {
            sStack.push(i); // put an unmatched left side node back in S to start
          }
        }
      } else { // vertex was already matched, new path is [something umatched in S] --> ... --> prevY[y] --> y --> x
        prevX[x] = y; // update alternating path with edge between x and y, recall prevY[y] is already set
        sStack.push(x);        // add this already matched vertex to S
      }
    }
  }
  return xToY;
}

Sample Run

Using our graph, we will intialize with all $L(x) = 10$ for all $x \in X$ and $L(y) = 0$ for all $y \in Y$. In the first run, we greedily match $x_1$ to $y_1$. I'll omit feasible edges since all edges are always feasbile. I'll denote edges in the equality subgraph in green, matches in orange, edges in the alternating path and matching in red, and edges in the alternating path not in matching in blue.

10 10 10 10 10 10 10 10 0 0 0 0 10 10 10 10 10 10 10 10 0 0 0 0

On the second run, we add umatched $x_2$ to $S$. Only $y_1$ is a neighbor of $S$, so we add $y_1$ to $T$. $y_1$ is already matched to $x_1$, so we add $x_1$ to $S$, too.

10 10 10 10 10 10 10 10 0 0 0 0 10 10 10 10 10 10 10 10 0 0 0 0

Now, $S$ has no neighbors in the equality subgraph not in $T$, so we need to fix the labeling. We find that $(x_1,y_4)$ is the edge that is closest to being in the equality subgraph. We fix our labeling, add $y_4$ to $T$, then invert our augmenting path to get a new matching.

10 9 10 10 10 10 10 10 10 0 0 0 0 10 9 10 10 10 9 9 10 10 1 0 0 0 10 9 10 10 10 9 9 10 10 1 0 0 0

The other vertices will be greedily matched, so we end up with the total weight being 39.

10 9 10 10 10 9 9 10 10 1 0 0 0 10 9 10 10 10 9 9 10 10 1 0 0 0

Case Study

My motivation for learning this algorithm was Costly Labels in Round 2 of the 2016 Facebook Hacker Cup. Given a tree (a graph with such that there is exactly one path that doesn't cross itself between any two vertices), we want to color the vertices to minimize the cost. Coloring vertex $i$ color $k$ costs us $C_{i,k}$, which is given. Also, if any of the vertices neighbors are of the same color, then we have to pay a penalty factor $P$.

The number of colors and vertices is rather small, so we can solve this using some clever brute force with dynamic programming. To my credit, I actually was able to discover the general dynamic programming regime, myself.

We take advantage of the tree structure. Suppose for each vertex, we compute $K(i, k_p, k_i)$, where $i$ is the current vertex, $k_p$ is the color of the parent, and $k_i$ is the color of $i$. $K(i, k_p, k_i)$ will be the minimum cost of this vertex plus all its children. Our solution will be $\min_k K(0, 0, k)$ if we root our tree at vertex $0$.

Now, $K(i, k_p, k_i) = \min(\text{cost with penalty},\text{cost without penalty})$. The cost with penalty is to greedily color all the child vertices than add $P$. The cost without penalty is where the Hungarian method comes in. We need to assign each child a color different than the parent in way that costs least. Thus, we view $X$ as children of the current vertex and $Y$ as colors. We can modify the Hungarian method to do this by setting edge weights to be $M - w(x,y)$, where $M$ is a large number. We give edges going to the parent color weight $0$, so they won't be used.

All in all, we have a solution that is just barely fast enough as it takes around 30 seconds to run on my computer.

#include <algorithm>
#include <ctime>
#include <exception>
#include <iostream>
#include <queue>
#include <set>
#include <stack>
#include <vector>

using namespace std;

/* hungarian method for maximum weighted matching of a bipartite graph
 * Consider a weighted bipartite graph G = (X,Y). X is vertices on left side, Y is vertices on the right side
 * We must have |X| <= |Y|. Match each vertex on the left to the a distinct vertex on the right with maximum total weight of edges
 */
vector<int> findMaximumAssignment(const vector<vector<int>> &weights) {
  ...
  return xToY;
}

struct Tree {
  int root;
  vector<int> parent;
  vector<set<int>> children;
  explicit Tree(int root, int N) : root(root), parent(N, -1), children(N) { }
};

// root the tree, undefined behavior if graph is not a tree
Tree rootTree(int root, const vector<set<int>> &adjacencyList) {
  int N = adjacencyList.size();
  Tree tree(root, N);
  tree.parent[root] = -1;
  stack<int> s; s.push(root);
  while (!s.empty()) {
    int currentVertex = s.top(); s.pop();
    for (int nextVertex : adjacencyList[currentVertex]) {   
      if (nextVertex != tree.parent[currentVertex]) { // don't recurse into parent
        if (tree.parent[nextVertex] != -1) throw logic_error("a cycle was found, this graph is not a tree");
        tree.children[currentVertex].insert(nextVertex);
        tree.parent[nextVertex] = currentVertex;
        s.push(nextVertex);
      }
    }
  }    
  return tree;
}

int computeMinimumCostHelper(const Tree &tree, const vector<vector<int>> &C, int P,
                             int vertex, int parentColor, int vertexColor,
                             vector<vector<vector<int>>> &memo) {
  int N = C.size();             // number of vertices
  int K = C.back().size();      // number of colors
  if (memo[vertex][parentColor][vertexColor] != -1) return memo[vertex][parentColor][vertexColor];
  int parent = tree.parent[vertex];
  int minCost = C[vertex][vertexColor] + P; // first calculate cost if we're willing to take penalty
  vector<vector<int>> childCostByColor; childCostByColor.reserve(tree.children[vertex].size()); // prepare for computation without penalty
  for (int child : tree.children[vertex]) {
    int minChildCost = INT_MAX;
    childCostByColor.emplace_back(); childCostByColor.back().reserve(K);
    for (int childColor = 0; childColor < K; ++childColor) {
      int childCost = computeMinimumCostHelper(tree, C, P, child, vertexColor, childColor, memo);
      // weight 0 effectively ensures that the parent color will not be used for the children, invert edges to find max assignment
      childCostByColor.back().push_back(parent != -1 && parentColor == childColor ? 0 : INT_MAX - childCost);
      minChildCost = min(minChildCost, childCost); // if we're taking penalty just take min cost
    }
    minCost += minChildCost;
  }
  // only count parent if it exists, check that we don't have too many childen
  if (childCostByColor.size() < K || (parent == -1 && childCostByColor.size() == K)) {    
    int noPenaltyCost = C[vertex][vertexColor];
    vector<int> optimalAssignment = findMaximumAssignment(childCostByColor); // assign children to distinct colors
    for (int i = 0; i < optimalAssignment.size(); ++i) noPenaltyCost += INT_MAX - childCostByColor[i][optimalAssignment[i]];
    minCost = min(minCost, noPenaltyCost);
  } 
  if (parent == -1) {
    for (int k = 0; k < K; ++k)  memo[vertex][k][vertexColor] = minCost; // doesn't matter what parent color is if no parent
  } else {
    memo[vertex][parentColor][vertexColor] = minCost;
  }
  return minCost;
}

int computeMinimumCost(const Tree &tree, const vector<vector<int>> &C, int P) {
  int N = C.size();             // number of vertices
  int K = C.back().size();      // number of colors
  // memo[vertex index][parent color][vertex color] = cost of coloring current vertex and children (excludes parent cost)
  vector<vector<vector<int>>> memo(N, vector<vector<int>>(K, vector<int>(K, -1)));  
  int minimumCost = INT_MAX;
  for (int k = 0; k < K; ++k) { // vary color of root since root has no parent
    minimumCost = min(minimumCost,
                      computeMinimumCostHelper(tree, C, P, tree.root, 0, k, memo));
  }
  return minimumCost;
}

int main(int argc, char *argv[]) {
  clock_t startTime = clock();
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T;
  cin >> T;
  for (int t = 1; t <= T; ++t) {
    int N, K, P; cin >> N >> K >> P; // total vertices, max label, and penalty fee
    // penalty occurs when a node has at least one pair of neighbors with same nodes
    vector<vector<int>> C; C.reserve(N); // C[i][j] cost of coloring vertex i, j
    for (int i = 0; i < N; ++i) {
      C.emplace_back(); C.back().reserve(K);
      for (int j = 0; j < K; ++j) {
        int c; cin >> c; C.back().push_back(c);
      }
    }
    vector<set<int>> adjacencyList(N);
    for (int i = 0; i < N - 1; ++i) {
      int a, b; cin >> a >> b;
      --a; --b;                 // convert to 0-based indexing      
      adjacencyList[a].insert(b);
      adjacencyList[b].insert(a);
    }
    Tree tree = rootTree(0, adjacencyList);
    cout << "Case #" << t << ": " 
         << computeMinimumCost(tree, C, P)
         << '\n';
  }
  // finish
  cout << flush;
  double duration = (clock() - startTime) / (double) CLOCKS_PER_SEC;
  cerr << "Time taken (seconds): " << duration << endl;
  return 0;
}

After getting a perfect score in the first round, I crashed and burned in round 2, and only managed to get 1 problem out of 4 correct. For what it's worth, my solution on the 2nd problem got over 80% of the test cases right. For the 3rd problem, I managed to finish a correct solution 10 minutes after the contest was over. As punishment for such a dismal performance, I'm forcing myself to write up the solutions to the problems of this round.

Boomerang Decoration

This problem was the easiest as it was the only one that I got full credit for. Here's a link to the problem statement, Boomerang Decoration.

I employed a pretty common strategy. Basically, there are $N$ spots to paint. Let us index them $0,1,\ldots,N-1.$ Now, consider a pivot point $p.$ We will paint everything $i < p$ from the left using a prefix. We will paint everything $i \geq p$ from the right using a suffix. Let $L(p)$ be the number of prefix paintings we need if we pivot at point $p.$ Let $R(p)$ be the number of suffix paintings we need if we pivot at point $p.$ Then, we have that our solution is $$\min_{p \in \{0,1,\ldots,N\}} \max\left(L(p), R(p)\right),$$ so we just need to compute $L(p)$ and $R(p)$, which we can do with a recurrence relation and dynamic programming.

Let $x_0,x_1,\ldots,x_{N-1}$ be the initial left half of the boomerang. Let $y_0,y_1,\ldots,y_{N-1}$ be the right half of the boomerang that we're trying transform the left side into. Let $L^*(p)$ be the number of blocks we've seen so far, where a block is defined as a contiguous sequence of letters. Clearly, $L(0) = L^*(p) = 0$ since we're not painting anything in that case. Then, for $p = 1,2,\ldots,N$, \begin{equation} L^*(p) = \begin{cases} 1 &\text{if}~p=1 \\ L^*(p - 1) &\text{if}~x_{p-1} = x_{p-2} \\ L^*(p - 1) + 1 &\text{if}~x_{p-1} \neq x_{p-2}, \end{cases} ~\text{and}~ L(p) = \begin{cases} L(p-1) &\text{if}~x_{p-1} = y_{p-1} \\ L^*(p) &\text{if}~x_{p-1} \neq y_{p-1}. \end{cases} \end{equation} since if the letters match, there is no need to paint, and if they don't we only need to paint once for each block.

Similarly, we define $R^*(p)$ as the number of blocks seen from the right. $R(N) = R^*(N) = 0$ since the $N$th index doesn't actually exist. Then, for $p = N-1,N-2,\ldots,0$, \begin{equation} R^*(p) = \begin{cases} 1 &\text{if}~p=N-1 \\ R^*(p + 1) &\text{if}~x_{p} = x_{p+1} \\ R^*(p + 1) + 1 &\text{if}~x_{p} \neq x_{p+1}, \end{cases} ~\text{and}~ R(p) = \begin{cases} R(p+1) &\text{if}~x_{p} = y_{p} \\ R^*(p) &\text{if}~x_{p} \neq y_{p}. \end{cases} \end{equation}

Thus, our run time is $O(N)$. Here's the code that implements this idea.

#include <algorithm>
#include <climits>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

int countSteps(const string &left, const string &right) {
  int N = left.length();
  vector<int> leftSteps; leftSteps.reserve(N + 1);
  leftSteps.push_back(0);
  int leftBlocks = 0;
  for (int i = 0; i < N; ++i) {
    if (i == 0 || right[i] != right[i - 1]) ++leftBlocks;
    if (left[i] == right[i]) {
      leftSteps.push_back(leftSteps.back());
    } else {
      leftSteps.push_back(leftBlocks);
    }
  }
  vector<int> rightSteps(N + 1, 0);
  int rightBlocks = 0;
  for (int i = N - 1; i >= 0; --i) {
    if (i == N - 1 || right[i] != right[i + 1]) ++rightBlocks;
    if (left[i] == right[i]) {
      rightSteps[i] = rightSteps[i + 1];
    } else {
      rightSteps[i] = rightBlocks;
    }
  } 
  int minSteps = INT_MAX;
  for (int i = 0; i <= N; ++i) { 
    // paint everything strictly to the left, paint everything to right including i
    minSteps = min(minSteps, max(leftSteps[i], rightSteps[i]));
  }
  return minSteps;  
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  for (int t = 1; t <= T; ++t) {
    int N; cin >> N;
    string left, right;
    cin >> left >> right;
    cout << "Case #" << t << ": "
         << countSteps(left, right)
         << '\n';
  }
  cout << flush;
  return 0;  
}

Carnival Coins

This problem is a probability problem that also makes use of dynamic programming and a recurrence relation. Here's the problem statement, Carnival Coins. I probably spent too long worrying about precision and trying to find a closed-form solution.

In any case, for this problem, given $N$ coins, we need to calculate the binomial distribution for all $n = 0,1,2,\ldots,N$ with probability $p$. Fix $p \in [0,1]$. Let $X_{n,k}$ be the probability $\mathbb{P}(X_n = k),$ where $X_n \sim \operatorname{Binomial}(n,p)$, that is, it is the number of heads if we flip $n$ coins. We use a similar idea to counting an unordered set of $k$ objects from $n$ objects without replacement in Counting Various Things.

Clearly, $\mathbb{P}(X_{n,k}) = 0$ if $k > n$. Also, $\mathbb{P}(X_{0,0}) = 1$. Now let $n \geq 1.$ Consider the $n$th coin. It's heads with probability $p$ and tails with probability $p - 1$, so for $k = 0,1,\ldots,n$, we have that \begin{equation} X_{n,k} = \begin{cases} (1-p)X_{n-1,0} &\text{if}~k=0 \\ (1-p)X_{n-1,k} + pX_{n-1,k-1} &\text{if}~k=1,2,\ldots,n-1 \\ pX_{n-1,n-1} &\text{if}~k=n \end{cases} \end{equation} since if we flip tails, we must have $k$ heads in the first $n-1$ coins, and if we flip heads, we must have $k - 1$ heads in the first $n$ coins.

Now, the problem states that we win if we get more that $K$ coins, too, so we really need the tail distribution. Define $Y_{n,k} = \mathbb{P}(X_n \geq k)$. Then, $Y_{n,n} = X_{n,n}$ since we can't have more than $n$ heads, and for $k = 0,1,\ldots,n-1$, \begin{equation} Y_{n,k} = X_{n,k} + Y_{n,k+1}. \end{equation}

We can compute this all in $O(N^2)$ time. I was hesistant to do this calculate since $p$ is a double, and I was afraid of the loss of precision, but it turns out using a long double table works.

Now, suppose we want to maximize expected value with $N$ coins. We can play all $N$ coins at once. Then, our probability of winning is $Y_{N,K}.$ Our second option is to break up our coins into two groups of size say $m$ and $N-m$. These two groups may further be broken up into more groups. Suppose we know the optimal strategy for $n = 1,2,\ldots,N-1$ coins. Let $E[n]$ be the maximum expected value when playing with $n$ coins. The maximum expected value of playing with the two groups, $m$ coins and $N-m$ coins, is $E[m] + E[N-m]$ by linearity of expectation.

This strategy only makes sense if both of the groups are of size at least $K$. Clearly, $E[K] = Y_{K,K} = X_{K,K}.$ Then, for all $n = 0,1,2, \ldots, N,$ we have \begin{equation} E[n] = \begin{cases} 0, &\text{if}~n < K \\ Y_{K,K}, &\text{if}~n = K \\ \max\left(Y_{n,K}, \sup\left\{E[m] + E[n-m] : m = K,K+1,\ldots,\lfloor n/2\rfloor\right\}\right), &\text{if}~n = K + 1,\ldots,N. \end{cases} \end{equation}

Our solution is $E[N]$. Since $N \geq K$, running time is $O(N^2)$. Here's the code.

#include <iostream>
#include <iomanip>
#include <vector>

using namespace std;

long double P[3001][3001];      // pre allocate memory for probability

double computeExpectedPrizes(int N, int K, double p) {
  // P[n][k] = P(X_n >= k), where X_n ~ Binomial(n, p)
  // first calculate P(X_n = k)
  P[0][0] = 1;
  P[1][0] = 1 - p; P[1][1] = p;
  for (int n = 2; n <= N; ++n) {
    P[n][0] = P[n-1][0]*(1-p);
    for (int k = 1; k < N; ++k) {
      // probability of hitting k when nth coin is heads and nth coin is tails
      P[n][k] = p*P[n-1][k-1] + (1-p)*P[n-1][k];
    }
    P[n][n] = P[n-1][n-1]*p;
  }
  // make cumulative
  for (int n = 1; n <= N; ++n) {
    for (int k = n - 1; k >= 0; --k) P[n][k] += P[n][k+1];
  }

  vector<long double> maxExpectedValue(N + 1, 0); // maxExpectedValue[n] is max expected value for n coins
  // two cases: all coins in 1 group or coins in more than 1 group
  for (int n = 0; n <= N; ++n) maxExpectedValue[n] = P[n][K]; // put all the coins in 1 group
  for (int n = 1; n <= N; ++n) {
    // loop invariant is that we know maxExpectedValue for 0,...,n - 1
    for (int m = K; m <= n/2; ++m) { // just do half by symmetry
      // split coins into two parts, play separately with each part
      maxExpectedValue[n] = max(maxExpectedValue[n], 
                                maxExpectedValue[m] + maxExpectedValue[n - m]);
    }
  }  
  return maxExpectedValue.back();
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  cout << fixed;
  cout << setprecision(9);
  for (int t = 1; t <= T; ++t) {
    int N, K;                   // coins and goal
    cin >> N >> K;
    double p; cin >> p;         // probability of coin landing heads
    cout << "Case #" << t << ": "
         << computeExpectedPrizes(N, K, p)
         << '\n';
  }
  cout << flush;
  return 0;
}

Snakes and Ladders

This problem was the one I finished 10 minutes after the contest ended. I had everything right, but for some reason, I got stuck on deriving a fairly simple recurrence relation in the last 10 minutes. Here's the problem statement, Snakes and Ladders.

A couple of key insights must be made here.

  • Since a snake occurs between ladders of the same height, so group them by height.
  • Taller ladders obstruct snakes, so process the ladders by descending height, and store obstructions as we go.
  • Deal with the ladders in unobstructed blocks, so sort them by position to put ladders in contiguous blocks

Now, the cost of feeding a snake is the square of its length, which makes processing each unobstructed block a little bit tricky. This is where I got stuck during the contest. The naive way is to compute all the pairwise distances and square them. This isn't fast enough. Here's a better method.

Let $x_1,x_2,\ldots,x_N$ be the position of our ladders, such that $x_1 \leq x_2 \leq \cdots \leq x_N$. Now, for $n \geq 2,$ let $$ C_n = \sum_{k=1}^{n-1} (x_n - x_k)^2 ~\text{and}~ S_n = \sum_{k=1}^{n-1} (x_n - x_k) ,$$ so the total cost of feeding this blocks is $C = \sum_{n=2}^N C_n$. We have that \begin{align*} C_n &= \sum_{k=1}^{n-1} (x_n - x_k)^2 = \sum_{k=1}^{n-1}\left((x_n - x_{n-1}) + (x_{n-1} - x_k)\right)^2 \\ &= \sum_{k=1}^{n-1}\left[(x_n - x_{n-1})^2 + 2(x_n-x_{n-1})(x_{n-1} - x_k) + (x_{n-1}-x_k)^2\right]\\ &= C_{n-1} + (n-1)(x_n - x_{n-1})^2 + 2(x_n-x_{n-1})\sum_{k=1}^{n-1}(x_{n-1} - x_k)\\ &= C_{n-1} + (n-1)(x_n - x_{n-1})^2 + 2(x_n-x_{n-1})S_{n-1} \end{align*} since the last term drops out the summation when $k = n - 1$. Then, we can update $S_n = S_{n-1} + (n-1)(x_n - x_{n-1}).$ We let $C_1 = S_1 = 0.$ Thus, we can compute $C_n$ in $O(1)$ time if we already know $C_{n-1}.$

Since we only look at each ladder once, the biggest cost is sorting, so the running time is $O(N\log N)$, where $N$ is the number of ladders. Here's the code.

#include <algorithm>
#include <iostream>
#include <map>
#include <set>
#include <vector>

using namespace std;

const int MOD = 1000000007;

int computeFeedingCost(const map<int, vector<int>> &ladders) {  
  set<int> blockEnds; blockEnds.insert(1000000000); // block delimiter
  long long cost = 0;
  // go through heights in decreasing order
  for (map<int, vector<int>>::const_reverse_iterator hIt = ladders.crbegin(); hIt != ladders.crend(); ++hIt) { 
    int currentLadder = 0;    
    int N = (hIt -> second).size(); // number of ladders at this height
    for (int blockEnd : blockEnds) {  // go block by block, where blocks are delimited by blockEnds vector
      int blockStart = currentLadder; // remember the beginning of the block
      long long xSum = 0;
      long long xSquaredSum = 0;
      while (currentLadder < N && (hIt -> second)[currentLadder] <= blockEnd) {
        if (currentLadder > blockStart) {
          // difference in position from this ladder to previous ladder
          long long xDiff = (hIt -> second)[currentLadder] - (hIt -> second)[currentLadder-1]; 
          xSquaredSum += (currentLadder - blockStart)*(xDiff*xDiff) + 2*xDiff*xSum; xSquaredSum %= MOD;
          xSum += (currentLadder - blockStart)*xDiff; xSum %= MOD;
          cost += xSquaredSum; cost %= MOD; 
        }
        if ((hIt -> second)[currentLadder] == blockEnd) {
          break;                // start next block from this ladder
        } else {
          ++currentLadder;
        }
      }
    }
    for (int newBlockEnd : hIt -> second) blockEnds.insert(newBlockEnd);
  }
  return cost;
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  for (int t = 1; t <= T; ++t) {
    int N;                      // number of ladders
    cin >> N;
    map<int, vector<int>> ladders; // ladders by height
    for (int n = 0; n < N; ++n) {
      int x, h; cin >> x >> h;  // ladder position and height
      ladders[h].push_back(x);
    }
    for (map<int, vector<int>>::iterator it = ladders.begin(); it != ladders.end(); ++it) {
      // within each height sort by position
      sort((it -> second).begin(), (it -> second).end());
    }
    cout << "Case #" << t << ": "
         << computeFeedingCost(ladders)
         << '\n';
  }
  cout << flush;
  return 0;
}

Costly Labels

This problem is much more involved. I'll write about it in a separate post, Assignment Problem and the Hungarian Method.


Photo URL is broken

When stuck inside because of the snow, what else is there to do but bake and code? I made these brownies here. They are amazingly moist, but I'll probably cut down on the sugar next time I make them.

On the algorithms side, I finally got to the Platinum Division on the USA Computing Olympiad. It's primarily for high school students, but I find it fun to participate anyway.

One of the competition's problems employs clever usage of binary search that I want to write about. Basically, there are times when the solution is very hard to compute, but it is not too costly to verify. If the solution is numerical and bounded, we can guess solutions with binary search. I've actually been quite familiar with this strategy for a year now, but somehow I missed it in this particular problem. Here, we use a two-dimensional binary search. Thankfully, I got enough points to get promoted to the next division anyway, anyway.

Angry Cows

Here's the problem statement:

Bessie the cow has designed what she thinks will be the next big hit video game: "Angry Cows". The premise, which she believes is completely original, is that the player shoots a cow with a slingshot into a one-dimensional scene consisting of a set of hay bales located at various points on a number line; the cow lands with sufficient force to detonate the hay bales in close proximity to her landing site, which in turn might set of a chain reaction that causes additional hay bales to explode. The goal is to use a single cow to start a chain reaction that detonates all the hay bales. There are $N$ hay bales located at distinct integer positions $x_1,x_2,\ldots,x_N$ on the number line. If a cow is launched with power $R$ landing at position $x$, this will causes a blast of "radius $R$", engulfing all hay bales within the range $x−R \ldots x+R$. These hay bales then themselves explode (all simultaneously), each with a blast radius of $R−1$. Any not-yet-exploded bales caught in these blasts then all explode (all simultaneously) with blast radius $R−2$, and so on.

Please determine the minimum amount of power $R$ with which a single cow may be launched so that, if it lands at an appropriate location, it will cause subsequent detonation of every single hay bale in the scene.

INPUT FORMAT (file angry.in):

The first line of input contains $R$ ($2 \leq N \leq 50,000$). The remaining $N$ lines all contain integers $x_1 \ldots x_N$ (each in the range $0 \ldots 1,000,000,000$).

OUTPUT FORMAT (file angry.out):

Please output the minimum power $R$ with which a cow must be launched in order to detonate all the hay bales. Answers should be rounded and printed to exactly $1$ decimal point.

So, if we assume the hay bales are sorted $x_1 \leq \cdots \leq x_N$. The minimum blast radius must be at most $(x_N - x_1)/2$ since we can just launch such a cow at the midpoint and destroy all the hay bales without the chain reaction. It's also worth noting that if the optimal blast radius is $R^*,$ then $2R^* \in \mathbb{Z}$, that is, twice the optimal blast radius is an integer. Since all the hay bales are located at integer coordinates, adding less than $0.5$ to the radius will never encompass another hay bale. Finally, the last observation is that we should fire the cow so that the very left of the blast lines up exactly with a hay bale since we would not gain anything by having the hay bale strictly inside the blast radius.

Let $L$ be the index of the leftmost hay bale hit by the initial blast. Thus, we could brute force by trying all $2R^* \in \{0,1,\ldots,x_N-x_1\}$ and $L \in \{1,2,\ldots,N\}$. To check if such values work, we can simulate the chain reaction which takes $O(N)$ time. Thus, brute force would take $O\left(N^2(x_N - x_1)\right)$ time. This is where binary search comes in.

During the contest, it was obvious to me that we should do a binary search to find $2R^*$ considering that $x_N - x_1$ could be as large as $10^9$. However, this is not fast enough, as that only gets us $O\left(N^2\log(x_N-x_1)\right)$ time, and $N^2$ can be as large as $2.5 \times 10^9$. After sleeping on it, I made the key insight that we can binary search on the index of the leftmost hay bale, too, so now we have $O\left(N\log(N)\log(x_N-x_1)\right)$ time, which is adequate.

To make this explicit, here's the code:

import java.io.*;
import java.util.*;

public class angry {

    /* check that all the hay bales to the left of idx explode 
     * if we throw cow of power T/2 at hayBales[idx] + T/2
     */
    public static boolean leftExplodes(int idx, int T, int[] hayBales) {
        double currentFloor = hayBales[idx];
        double currentR = T/2.0;
        int left; // leftmost exploded bale
        for (left = idx; left >= 0 && hayBales[left] >= currentFloor; --left) {
            if (left == 0 || hayBales[left - 1] >= currentFloor) continue;
            currentR -= 1.0;
            currentFloor = hayBales[left] - currentR;
        }
        return left == -1;
    }

    public static boolean isDiameterPossible(int T, int[] hayBales) {
        int N = hayBales.length;
        int leftMin = 0; // inclusive
        int leftMax = N; // exclusive         
        int leftIdx = leftMin + (leftMax - leftMin)/2;
        while (leftMin < leftMax) { // find smallest left such that this doesn't work
            if (leftExplodes(leftIdx, T, hayBales)) {
                leftMin = leftIdx + 1;
            } else {
                leftMax = leftIdx;
            }
            leftIdx = leftMin + (leftMax - leftMin)/2;            
        }        
        --leftIdx; // this works
        // now check that the right explodes
        double currentCeiling = hayBales[leftIdx] + T;
        double currentR = T/2.0;
        int right;
        for (right = leftIdx; right < N && hayBales[right] <= currentCeiling; ++right) {
            if (right == N - 1 || hayBales[right + 1] <= currentCeiling)  continue;
            currentR -= 1.0;
            currentCeiling = hayBales[right] + currentR;
        }        
        return right == N;        
    }        

    public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader(new FileReader("angry.in"));
        PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("angry.out")));
        int N = Integer.parseInt(in.readLine());
        int[] hayBales = new int[N];
        for (int n = 0; n < N; ++n) hayBales[n] = Integer.parseInt(in.readLine());
        Arrays.sort(hayBales);

        // search for T = 2R
        int minT = 0; int maxT = hayBales[N - 1] - hayBales[0];
        int T = minT + (maxT - minT)/2;       
        while (minT < maxT) { // find smallest T that works
            if (isDiameterPossible(T, hayBales)) {
                maxT = T;
            } else {
                minT = T + 1;
            }
            T = minT + (maxT - minT)/2;
        }
        out.printf("%.1f\n", T/2.0);
        in.close();
        out.close();
    }

}

Photo URL is broken

Some time ago, I was doing a problem on HackerRank that in introduced me to two new data structures that I want to write about. The problem is called Cross the River.

The premise is this:

You're standing on a shore of a river. You'd like to reach the opposite shore.

The river can be described with two straight lines on the Cartesian plane, describing the shores. The shore you're standing on is $Y=0$ and another one is $Y=H$.

There are some rocks in the river. Each rock is described with its coordinates and the number of points you'll gain in case you step on this rock.

You can choose the starting position arbitrarily on the first shore. Then, you will make jumps. More precisely, you can jump to the position $(X_2,Y_2)$ from the position $(X_1,Y_1)$ in case $\left|Y_2−Y_1\right| \leq dH$, $\left|X_2−X_1\right| \leq dW$ and $Y_2>Y_1$. You can jump only on the rocks and the shores.

What is the maximal sum of scores of all the used rocks you can obtain so that you cross the river, i.e. get to the opposite shore?

No two rocks share the same position, and it is guaranteed that there exists a way to cross the river.

Now, my first instinct was to use dynamic programming. If $Z_i$ is the point value of the rock, and $S_i$ is the max score at rock $i$, then $$ S_i = \begin{cases} Z_i + \max\{S_j : 1 \leq Y_i - Y_j \leq dH,~|X_i - X_j| \leq dW\} &\text{if rock is reachable} \\ -\infty~\text{otherwise,} \end{cases} $$ where we assume the existence of rocks with $Y$ coordinate $0$ of $0$ point value for all $X.$

Thus, we can sort the rocks by their $Y$ coordinate and visit them in order. However, we run into the problem that if $dW$ and $dH$ are large we may need to check a large number of rocks visited previously, so this approach is $O(N^2).$

My dynamic programming approach was the right idea, but it needs some improvements. Somehow, we need to speed up the process of looking through the previous rocks. To do this, we do two things:

  1. Implement a way to quickly find the max score in a range $[X-dW, X + dW]$
  2. Only store the scores of rocks in range $[Y-dH, Y)$

To accomplish these tasks, we use two specialized data structures.

Segment Trees

Segment trees solve the first problem. They provide a way to query a value (such as a maximum or minimum) over a range and update these values in $\log$ time. The key idea is to use a binary tree, where the nodes correspond to segments instead of indices.

For example suppose that we have $N$ indices $i = 0,1,\ldots, N-1$ with corresponding values $v_i.$ Let $k$ be the smallest integer such that $2^k \geq N.$ The root node of our binary tree will be the interval $[0,2^k).$ The first left child will be $[0,2^{k-1}),$ and the first right child will be $[2^{k-1},2^k).$ In general, we have for some node $[a,b)$ if $b - a > 1$, then the left child is $[a,(b-a)/2),$ and the right child is $[(b-a)/2,b).$ Otherwise, if $b - a = 1$, there are no children, and the node is a leaf. For example, if $5 \leq N \leq 8$, our segment tree looks like this.

[0, 8) [0, 4) [0, 2) [0, 1) [1, 2) [2, 4) [2, 3) [3, 4) [4, 8) [4, 6) [4, 5) [5, 6) [6, 8) [6, 7) [7, 8)

In general, there are $2^0 + 2^1 + 2^2 + \cdots + 2^k = 2^{k+1} - 1$ nodes needed. $2N - 1 \leq 2^{k+1} - 1 \leq 2^2(N-1) - 1$, so the amount of memory needed is $O(N).$ Here's the code for constructing the tree.

class MaxSegmentTree {
    private long[] maxes;        
    private int size;
    public MaxSegmentTree(int size) {
        int actualSize = 1;
        while (actualSize < size) actualSize *= 2;
        this.size = actualSize;
        // if size is 2^k, we need 2^(k+1) - 1 nodes for all the intervals
        maxes = new long[2*actualSize - 1];
        Arrays.fill(maxes, Long.MIN_VALUE);
    }
    ...
}

Now, for each node $[a,b),$ we store a value $\max(v_a,v_{a+1},\ldots,v_{b-1}).$ An update call consists of two parameters, an index $k$ and a new $v_k.$ We would traverse the binary tree until we reach the node $[k, k+1)$ and update that node. Then, we update the max of each ancestor by taking the max of its left and right child since the segment of child is always contained in the segment of the parent. In practice, this is done recursively like this.

class MaxSegmentTree {
    ...
    public long set(int key, long value) {
        return set(key, value, 0, 0, this.size);
    }
    /** 
     * @param node index of node since binary tree is implement with array
     * @param l    lower bound of segement (inclusive)
     * @param r    upper bound of segement (exclusive)
     */
    private long set(int key, long value,
                     int node, int l, int r) {
        // if not in range, do not set anything
        if (key < l || key >= r) return maxes[node]; 
        if (l + 1 == r) {
            // return when you reach a leaf
            maxes[node] = value;
            return value;
        }
        int mid = l + (r-l)/2;
        // left node
        long left = set(key, value, 2*(node + 1) - 1, l, mid);
        // right node
        long right = set(key, value, 2*(node + 1), mid, r);
        maxes[node] = Math.max(left, right);
        return maxes[node];
    }
    ...
}

A range max query takes two parameters: the lower bound of the range and the upper bound bound of the range in the form $[i,j).$ We obtain the max recursively. Let $[l,r)$ be the segment corresponding to a node. If $[l,r) \subseteq [i,j),$ we return the max associated with $[l,r)$. If $[l,r) \cap [i,j) = \emptyset,$ we ignore this node. Otherwise, $[l,r) \cap [i,j) \neq \emptyset,$ and $\exists k \in [l,r)$ such that $k \not\in [i,j),$ so $l < i < r$ or $l < j < r.$ In this case, we descend to the child nodes. The algorithm looks like this.

class MaxSegmentTree {
    ...
    /** 
     * @param i from index, inclusive
     * @param j to index, exclusive
     * @return the max value in a segment.
     */
    public long max(int i, int j) {
        return max(i, j, 0, 0, this.size);
    }

    private long max(int i, int j, int node, int l, int r) {
        // if in interval
        if (i <= l && r <= j) return maxes[node];
        // if completely outside interval
        if (j <= l || i >= r ) return Long.MIN_VALUE;
        int mid = l + (r-l)/2;    
        long left = max(i, j, 2*(node+1) - 1, l, mid);
        long right = max(i, j, 2*(node+1), mid, r);
        return Math.max(left, right);
    }
    ...
}

I prove that this operation is $O(\log_2 N).$ To simplify things, let us assume that $N$ is a power of $2$, so $2^k = N.$ I claim that the worst case is $[i,j) = [1, 2^k - 1).$ Clearly this is true when $k = 2$ since we'll have to visit all the nodes but $[0,1)$ and $[3,4),$ so we visit $5 = 4k - 3 = 4\log_2 N - 3$ nodes.

Now, for our induction hypothesis we assume that the operation is $O(\log_2 N)$ for $1,2,\ldots, k - 1$. Then, for some $k$, we can assume that $i < 2^{k-1}$ and $j > 2^{k-1}$ since otherwise, we only descend one half of the tree, and it reduces to the $k - 1$ case. Now, given $[i, j)$ and some node $[l,r)$, we'll stop there if $[i,j) \cap [l,r) = \emptyset$ or $[l,r) \subseteq [i,j).$ Otherwise, we'll descend to the node's children. Now, we have assumed that $i < 2^{k-1} < j,$ so if we're on the left side of the tree, $j > r$ for all such nodes. We're not going to visit any nodes with $r \leq i,$ we'll stop at nodes with $l \geq i$ and compare their max, and we'll descend into nodes with $l < i < r$. At any given node on the left side, if $[l,r)$ is not a leaf and $l < i < r$, we'll choose to descend. Let the left child be $[l_l, r_l)$ and the right child be $[l_r,r_r)$. The two child segments are disjoint, so we will only choose to descend one of them since only one of $l_l < i < r_l$ or $l_r < i < r_r$ can be true. Since $l_l = l < i$, we'll stop only at the right child if $l_r = i.$ If $i$ is not odd, we'll stop before we reach a leaf. Thus, the worst case is when $i$ is odd.

On the right side, we reach a similar conclusion, where we stop when $r_l = j,$ and so the worst case is when $j$ is odd. To see this visually, here's an example of the query $[1,7)$ when $k = 3.$ Nodes where we visit the children are colored red. Nodes where we compare a max are colored green.

[0, 8) [0, 4) [0, 2) [0, 1) [1, 2) [2, 4) [2, 3) [3, 4) [4, 8) [4, 6) [4, 5) [5, 6) [6, 8) [6, 7) [7, 8)

Thus, we'll descend at $2k - 1 = 2\log_2 N - 1$ nodes and compare maxes at $2(k-1) = 2(\log_2 N - 1)$ nodes, so $4\log_2 N - 3$ nodes are visited.

Max Queues

Now, the segment tree contains the max score at each $X$ coordinate, but we want to our segement tree to only contain values corresponding to rocks that are within range of our current position. If our current height is $Y$, we want rocks $j$ if $0 < Y - Y_j \leq dH.$

Recall that we visit the rocks in order of their $Y$ coordinate. Thus, for each $X$ coordinate we add the rock to some data structure when we visit it, and we remove it when it becomes out of range. Since rocks with smaller $Y$ coordinates become out of range first, this is a first in, first out (FIFO) situation, so we use a queue.

However, when removing a rock, we need to know when to update the segment tree. So, the queue needs to keep track of maxes. We can do this with two queues. The primary queue is a normal queue. The second queue will contain a monotone decreasing sequence. Upon adding to the queue, we maintain this invariant by removing all the smaller elements. In this way, the head of the queue will always contain the max element since it would have been removed otherwise. When we removing an element from the max queue, if the two heads are equal in value, we remove the head of each queue. Here is the code.

class MaxQueue<E extends Comparable<? super E>> extends ArrayDeque<E> {
    private Queue<E> q; // queue of decreasing subsequence of elements (non-strict)
    public MaxQueue() {
        super();
        q = new ArrayDeque<E>();
    }

    @Override
    public void clear() {
        q.clear();
        super.clear();
    }

    @Override
    public E poll() { 
        if (!super.isEmpty() && q.peek().equals(super.peek())) q.poll();
        return super.poll();
    }

    @Override
    public E remove() {
        if (!super.isEmpty() && q.peek().equals(super.peek())) q.remove();
        return super.remove();
    }        

    @Override
    public boolean add(E e) {
        // remove all the smaller elements
        while (!q.isEmpty() && q.peek().compareTo(e) < 0) q.poll();
        q.add(e);
        return super.add(e);
    }

    @Override
    public boolean offer(E e) {
        // remove all the smaller elements
        while (!q.isEmpty() && q.peek().compareTo(e) < 0) q.poll();
        q.offer(e);
        return super.offer(e);
    }

    public E max() {
        return q.element();
    }        
}

Solution

With these two data structures the solution is pretty short. We keep one segment tree that stores the current max at each $X$ coordinate. For each $X$, we keep a queue to keep track of all possible maxes. The one tricky part is to make sure that we look at all rocks at a certain height before updating the segment tree since lateral moves are not possible. Each rock is only added and removed from a queue once, and we can find the max in $\log$ time, so the running time is $O(N\log N)$, where $N$ is the number of rocks. Here's the code.

public class CrossTheRiver {

    private static final int MAX_X = 100000;
    ...
    public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(System.out)));
        StringTokenizer st = new StringTokenizer(in.readLine());
        int N = Integer.parseInt(st.nextToken()); // rocks
        int H = Integer.parseInt(st.nextToken()); // height
        int dH = Integer.parseInt(st.nextToken()); // max y jump
        int dW = Integer.parseInt(st.nextToken()); // max x jump        
        Rock[] rocks = new Rock[N];
        for (int i = 0; i < N; ++i) { // read through rocks
            st = new StringTokenizer(in.readLine());           
            int Y = Integer.parseInt(st.nextToken());
            int X = Integer.parseInt(st.nextToken()); // 0 index
            int Z = Integer.parseInt(st.nextToken());
            rocks[i] = new Rock(X, Y, Z);
        }        
        Arrays.sort(rocks);                
        long[] cumulativeScore = new long[N];
        MaxSegmentTree sTree = new MaxSegmentTree(MAX_X + 1);
        ArrayList<MaxQueue<Long>> maxX = new ArrayList<MaxQueue<Long>>(MAX_X + 1);
        for (int i = 0; i <= MAX_X; ++i) maxX.add(new MaxQueue<Long>());
        int i = 0; // current rock
        int j = 0; // in range rocks
        while (i < N) {
            int currentY = rocks[i].y;
            while (rocks[j].y < currentY - dH) {
                // clear out rocks that are out of range
                maxX.get(rocks[j].x).poll();
                if (maxX.get(rocks[j].x).isEmpty()) {
                    sTree.set(rocks[j].x, Long.MIN_VALUE);
                } else {
                    sTree.set(rocks[j].x, maxX.get(rocks[j].x).max());
                }
                ++j;
            }            
            while (i < N && rocks[i].y == currentY) {
                // get previous max score from segment tree
                long previousScore = sTree.max(rocks[i].x - dW, rocks[i].x + dW + 1);
                if (rocks[i].y <= dH && previousScore < 0) previousScore = 0;
                if (previousScore > Long.MIN_VALUE) {  // make sure rock is reachable
                    cumulativeScore[i] = rocks[i].score + previousScore;
                    // keep max queue up to date
                    maxX.get(rocks[i].x).add(cumulativeScore[i]); 
                }
                ++i;
            }       
            // now update segment tree
            for (int k = i - 1; k >= 0 && rocks[k].y == currentY; --k) {
                if (cumulativeScore[k] == maxX.get(rocks[k].x).max()) {
                    sTree.set(rocks[k].x, cumulativeScore[k]);
                }
            }
        }

        long maxScore = Long.MIN_VALUE;
        for (i = N - 1; i >= 0 && H - rocks[i].y <= dH; --i) {
            if (maxScore < cumulativeScore[i]) maxScore = cumulativeScore[i];
        }
        out.println(maxScore);
        in.close();
        out.close();
    }
}

I was recently doing a problem that involved depth-first search to find components of a graph. For me, the most natural way to write this is recursion. It's pretty simple:

void floodFill(int vertex, int currentComponent, const vector<set<int>> &edgeList,
               vector<int> &component, vector<vector<int>> &componentSets) {
  component[vertex] = currentComponent;
  componentSets.back().push_back(vertex);
  for (int i : edgeList[vertex]) {
    if (component[i] == -1) {
      floodFill(i, currentComponent, edgeList, component, componentSets);         
    }
  }
}

vector<vector<int>> findComponents(const vector<set<int>> &edgeList) {
  int N = edgeList.size();
  vector<int> component(N, -1);  
  vector<vector<int>> componentSets;
  int currentComponent = 0;
  for (int i = 0; i < N; ++i) {
    if (component[i] == -1) {
      componentSets.emplace_back();
      floodFill(i, currentComponent++, edgeList, component, componentSets);
    }
  }  
  return componentSets;  
}

component and componentSets are shared state variables, so we pass them by reference. It seems simple, enough, right? Unfortunately, the stack size is very limited on OS X. Trying to increase the stack size, I get

$ ulimit -s 65536
-bash: ulimit: stack size: cannot modify limit: Operation not permitted

The most that I can increase the stack size to with ulimit is a mere 8192 KB, which only works on a graph with less than 15,000 nodes.

Now the easiest solution is to increase the stack size with gcc like this

$ g++ -Wl,-stack_size,0x4000000 -stdlib=libc++ -std=c++0x D.cpp

We specify the stack size in bytes, so 0x4000000 is $4 \times 16^6 = 67108864$, which is 64 MB.

I want to cover a better solution that works everywhere, though. We can use an explicit stack for recursion:

vector<vector<int>> findComponents(const vector<set<int>> &edgeList) {
  int N = edgeList.size();
  vector<int> component(N, -1);  
  vector<vector<int>> componentSets;
  int currentComponent = 0;
  for (int i = 0; i < N; ++i) {
    if (component[i] == -1) {
      componentSets.emplace_back();
      stack<int> s; 
      s.push(i);
      component[i] = currentComponent;
      componentSets.back().push_back(i);
      while (!s.empty()) {
        int vertex = s.top(); s.pop();
        for (int j : edgeList[vertex]) {
          if (component[j] == -1) {
            component[j] = currentComponent;
            componentSets.back().push_back(j);
            s.push(j);
          }
        }
      }
      ++currentComponent;
    }
  }  
  return componentSets;  
}

This solution is actually more memory efficient since the implicit stack in the recursion solution contains the function state, which includes 5 arguments and other local variables. According to the online judge, the explicit stack-based solution is only using 2/3 of the memory as the recursive solution.

However, the astute reader may note that we aren't quite doing the same thing. For example, in the explicit stack verion, we assign the component before we push the vertex on the stack, whereas in the recursive version, we assign the component after we retrieve the vertex from the stack. This is necessary to avoid adding a vertex to a stack twice. Consider the undirected graph with the edges $\{(0,1), (0,2), (1,2)\}$.

  1. First, we push $0$ onto the stack.
  2. We pop $0$ and check its edges, and push $1$ and $2$ onto the stack.
  3. Stacks are last in, first out (LIFO), so we pop $2$ off the stack. Now, we check the edges of $2$. That includes an edge to $1$. If we added $1$ to the stack without assigning it to a component, yet, we would push $1$ onto the stack again.

We're also visiting visiting the nodes in a different order. In the recursive solution, we would visit an unassigned vertex as soon as we see it, so we would have $0 \rightarrow 1 \rightarrow 2$, but in this explict stack version, we get $0 \rightarrow 2 \rightarrow 1$.

An explicit stack version that better simulates what the recursive solution does is

vector<vector<int>> findComponents(const vector<set<int>> &edgeList) {
  int N = edgeList.size();
  vector<int> component(N, -1);  
  vector<vector<int>> componentSets;
  int currentComponent = 0;
  int iterations = 0;
  for (int i = 0; i < N; ++i) {
    if (component[i] == -1) {
      componentSets.emplace_back();
      stack<tuple<int, set<int>::iterator>> s; 
      s.emplace(i, edgeList[i].begin());
      while (!s.empty()) {
        // important to assign by reference
        tuple<int, set<int>::iterator> &state = s.top(); 
        int vertex = get<0>(state);
        set<int>::iterator &edgeIt = get<1>(state);        
        if (component[vertex] == -1) { // first time visiting
          component[vertex] = currentComponent;
          componentSets.back().push_back(vertex);
        }
        // if there is nothing else, pop
        bool shouldPop = true;
        while (edgeIt != edgeList[vertex].end()) {
          int j = *edgeIt; ++edgeIt;
          if (component[j] == -1) {
            s.emplace(j, edgeList[j].begin()); 
            shouldPop = false;  // something new is on stack, do not pop it
            break; // immediately put it on the stack, break, and visit next vertex
          } 
        }      
        if (shouldPop) s.pop();
      }
      ++currentComponent;
    }
  }  
  return componentSets;  
}

Now, we assign the component after getting the vertex off the stack, and when we see an unassigned vertex, we immediately put it on top of the stack instead of adding all unassigned vertices first. This uses slightly more memory because now the stack has to keep track of which edges were visited, so the state includes set<int>::iterator. Despite this explicit solution being more similar to recursion, I'd be hesitant to recommend this code because of how convoluted it is. You need to be very careful with references. Also, in more complicated cases, your state tuple will contain more than two variables, requiring you to keep track of more things and making it more likely that you will make a mistake.


Recently, I ran into a problem that forced me to recall a lot of old knowledge and use it in new ways, President and Roads. The problem is this:

We have $V$ vertices and $E$ edges. $2 \leq V \leq 10^5$, and $1 \leq E \leq 10^5$. Index these vertices $0, 1, \ldots, V - 1$. Edges are directed and weighted, and thus, can be written as $e_i = (a_i,b_i,l_i)$, where an edge goes from vertex $a_i$ to vertex $b_i$ with length $l_i$, where $1 \leq l_i \leq 10^6$. We're given a source $s$ and a target $t$. For each edge $e_i$, we ask:

If we take the shortest path from $s$ to $t$, will we always travel along $e_i$? If not, can we decrease $l_i$ to a positive integer so that we will always travel along $e_i$? If so, how much do we need to decrease $l_i$?

Shortest Path

Obviously, we need to find the shortest path. My first thought was to use Floyd-Warshall, since I quickly realized that just the distances from the $s$ wouldn't be enough. Computing the shortest path for all pairs is overkill, however. Let $d(a,b)$ be the distance between two vertices. An edge $e_i = (a_i,b_i,l_i)$ will belong to a shortest path if $$d(s,a_i) + l_i + d(b_i,t)$$ equals to the length of the shortest path. Thus, we only need the distances from the $s$ and to $t$. We can find the distance from $s$ and to $t$ by running Dijkstra's algorithm twice. To find the distances to $t$, we'll need to reverse the edges and pretend $t$ is our source. There's still the problem that in the most simple implementation of Djiktra's, you need to find the vertex of minimum distance each time by linear search, which leads to an $O(V^2)$ algorithm. With a little bit of work, we can bring it down to $O\left((E + V)\log V\right)$ with a priority queue. Unfortunately, the built-in C++ and Java implementations do not support the decrease key operation. That leaves us with two options:

  1. The easiest is to reinsert a vertex into the priority queue, and keep track of already visited vertices. If we see the vertex a second time, throw it away.
  2. Implement our own priority queue.

Of course, I chose to implement my own priority queue based on a binary heap with an underlying hash map to support the decrease key operation. If I was really ambitious, I'd implement a Fibonnaci heap, to achieve $O\left(E + V\log V\right)$ running time. See my C++ implementation below.

I use it in Djikstra's algorithm here:

vector<pair<long long, unordered_set<int>>> computeShortestPath(int s, const vector<unordered_map<int, pair<int, int>>> &edgeList) {
  int N = edgeList.size();
  vector<pair<long long, unordered_set<int>>> distance(N, make_pair(LLONG_MAX/3, unordered_set<int>()));
  phillypham::priority_queue pq(N);
  distance[s].first = 0;
  pq.push(s, 0);
  while (!pq.empty()) {  
    int currentVertex = pq.top(); pq.pop();
    long long currentDistance = distance[currentVertex].first;
    // relax step
    for (pair<int, pair<int, int>> entry : edgeList[currentVertex]) {
      int nextVertex = entry.first; int length = entry.second.first;
      long long newDistance = currentDistance + length;
      if (distance[nextVertex].first > newDistance) {
        distance[nextVertex].first = newDistance;
        distance[nextVertex].second.clear();
        distance[nextVertex].second.insert(currentVertex);
        if (pq.count(nextVertex)) {
          pq.decrease_key(nextVertex, newDistance);
        } else {
          pq.push(nextVertex, newDistance);
        }
      } else if (distance[nextVertex].first == newDistance) {        
        distance[nextVertex].second.insert(currentVertex);
      }
    }
  }
  return distance;
}

computeShortestPath returns a vector of pairs, where each pair consists of the distance from $s$ and the possible previous vertices in a shortest path. Thus, I've basically recreated the graph only including the edges that are involved in some shortest path.

Counting Paths

The next task is to figure out which edges are necessary. There could be multiple shortest paths, so if the shortest path has length $L$, given edge $e_i = (a_i, b_i, l_i)$, the condition $$d(s,a_i) + l_i + d(b_i,t) = L$$ is necessary but not sufficient. First, we need to make sure $e_i$ is distinct as there are no restrictions on duplicate edges. Secondly, every path must pass through $a_i$ and $b_i$. And finally, we need to make sure that $\left|P(b_i)\right| = 1$, where $P(b_i)$ is the set of vertices that come directly before vertex $b_i$ on a shortest path because you could imagine something like this scenario:

1 1 1 1 2 s t a c b

The shortest path is of length 4, and we have two options $s \rightarrow a \rightarrow c \rightarrow b \rightarrow t$ and $s \rightarrow a \rightarrow b \rightarrow t$. Every shortest path goes through $a$ and $b$, but the actual edge between $a$ and $b$ is not necessary to achieve the shortest path. Since we already computed the parents of $b_i$ when running computeShortestPath, it's simple to check that $b_i$ only has one parent.

Now, the only thing left to do is count the number of shortest paths that go through each vertex. For any vertex, $v$, let $w$ be the number of ways to reach $v$ from $s$ and $x$ be the number ways to reach $t$ from $v$. Then, the number of paths that pass through $v$ is $wx$. Now one could use something like depth-first search to compute $w_j$ for every vertex $j$. We'd start at $s$, take every path, and increment $w_j$ every time that we encouter $j$. This is too slow, though, as we'll visit every node multiple times. If there are $10^9$ paths to $t$, we'll take all $10^9$.

We can solve this problem by using a priority queue again using a similar idea to Dijkstra's. The idea is that for a paricular vertex $v$, $$w_v = \sum_{u \in P(v)} w_u,$$ where you may recall that $P(v)$ consists of all the parents of $v$. So, we just need to calculate all the number of paths from $s$ to the parents first. Since every edge has positive length, all the parents will have shorter distance from $s$, thus, we can use a priority queue to get the vertices in increasing order of distance from $s$. Every time we visit a node, we'll update all its children, and we'll never have to visit that node again, so we can compute the number of paths in $O\left((E + V)\log V\right)$ time. See the code:

const int MOD_A = 1000000207;
const int MOD_B = 1000000007;
const int MOD_C = 999999733;

tuple<int, int, int> addPathTuples(tuple<int, int, int> a, tuple<int, int, int> b) {
  return make_tuple((get<0>(a) + get<0>(b)) % MOD_A, (get<1>(a) + get<1>(b)) % MOD_B, (get<2>(a) + get<2>(b)) % MOD_C);
}

tuple<int, int, int> multiplyPathTuples(tuple<int, int, int> a, tuple<int, int, int> b) {
  long long a0 = get<0>(a), a1 = get<1>(a), a2 = get<2>(a);
  return make_tuple((a0*get<0>(b)) % MOD_A, (a1*get<1>(b)) % MOD_B, (a2*get<2>(b)) % MOD_C);
}

vector<tuple<int, int, int>> countPaths(int s, 
                                        const vector<pair<long long, unordered_set<int>>> &distances,
                                        const vector<pair<long long, unordered_set<int>>> &children) {
  // assume only edges that make shortest paths are included
  int N = children.size();
  vector<tuple<int, int, int>> pathCounts(N, make_tuple(0, 0, 0)); // store as tuple, basically modular hash function
  pathCounts[s] = make_tuple(1, 1, 1);
  phillypham::priority_queue pq(N);
  pq.push(s, 0);
  while (!pq.empty()) {
    int currentVertex = pq.top(); pq.pop();  
    for (int nextVertex : children[currentVertex].second) {
      pathCounts[nextVertex] = addPathTuples(pathCounts[currentVertex], pathCounts[nextVertex]);
      if (!pq.count(nextVertex)) {
        pq.push(nextVertex, distances[nextVertex].first);
      }
    }    
  }
  return pathCounts;
}

$x$, the number of paths to $t$ from $v$ can be computed in much the same way by reversing edges and starting at $t$.

Now, you might notice that instead of returning the number of paths, I return a vector of tuple<int, int, int>. This is because the number of paths is huge, exceeding what can fit in unsigned long long, which is $2^{64} - 1 = 18446744073709551615$. We don't actually care about the number of paths, though, just that the number of paths is equal to the total number of paths. Thus, we can hash the number of paths. I chose to hash the paths by modding it by three large prime numbers close to $10^9$. See the complete solution below.

Hashing and Chinese Remainder Theorem

At one level, doing this seems somewhat wrong. If there is a collision, I'll get the wrong answer, so I decided to do some analysis on the probability of a collision. It occured to me that hashing is actually just an equivalence relation, where two keys will belong to the same bucket if they are equivalent, so buckets are equivalence classes.

Now, I choose 3 large primes $p_1 = 1000000207$, $p_2 = 1000000007$, and $p_3 = 999999733$, each around $10^9$. We'll run into an error if we get two numbers $x$ and $y$ such that \begin{align*} x &\equiv y \bmod p_1 \\ x &\equiv y \bmod p_2 \\ x &\equiv y \bmod p_3 \end{align*} since $x$ and $y$ will have the same hash. Fix $y$. Let $N = p_1p_2p_3$, $y \equiv a_1 \bmod p_1$, $y \equiv a_2 \bmod p_2$, and $y \equiv a_3 \bmod p_3$. Then, by the Chinese Remainder Theorem, for $x$ to have the same hash, we must have $$ x \equiv a_1\left(\frac{N}{p_1}\right)^{-1}_{p_1}\frac{N}{p_1} + a_2\left(\frac{N}{p_2}\right)^{-1}_{p_2}\frac{N}{p_2} + a_3\left(\frac{N}{p_3}\right)^{-1}_{p_3}\frac{N}{p_3} \bmod N,$$ where $$\left(\frac{N}{p_i}\right)^{-1}_{p_i}$$ is the modular inverse with respect to $p_i$, which we can find by the extended Euclidean algorithm as mentioned here.

We'll have that \begin{align*} \left(\frac{N}{p_1}\right)^{-1}_{p_1} &\equiv 812837721 \bmod p_1 \\ \left(\frac{N}{p_2}\right)^{-1}_{p_2} &\equiv 57354015 \bmod p_2 \\ \left(\frac{N}{p_3}\right)^{-1}_{p_3} &\equiv 129808398 \bmod p_3, \end{align*} so if $a_1 = 10$, $a_2 = 20$, and $a_3 = 30$, then the smallest such $x$ that would be equivalent to $y$ is $x = 169708790167673569857984349$. In general, \begin{align*} x &= 169708790167673569857984349 + Nk \\ &= 169708790167673569857984349 + 999999946999944310999613117k, \end{align*} for some $k \in \mathbb{Z}$, which leaves us with only an approximately $10^{-27}$ chance of being wrong. Thus, for all intents and purposes, our algorithm will never be wrong.

Binary Heap Priority Queue

namespace phillypham {
  class priority_queue {
  private:
    int keysSize;
    vector<int> keys;
    vector<long long> values;
    unordered_map<int, int> keyToIdx;
    int parent(int idx) {
      return (idx + 1)/2 - 1;
    }
    int left(int idx) {
      return 2*(idx+1) - 1;
    }
    int right(int idx) {
      return 2*(idx+1);
    }
    void heap_swap(int i, int j) {
      if (i != j) {
        keyToIdx[keys[j]] = i;
        keyToIdx[keys[i]] = j;
        swap(values[j], values[i]);
        swap(keys[j], keys[i]);
      }
    }
    void max_heapify(int idx) {
      int lIdx = left(idx);
      int rIdx = right(idx);
      int smallestIdx = idx;
      if (lIdx < keysSize && values[lIdx] < values[smallestIdx]) {
        smallestIdx = lIdx;
      }
      if (rIdx < keysSize && values[rIdx] < values[smallestIdx]) {
        smallestIdx = rIdx;
      }
      if (smallestIdx != idx) {
        heap_swap(smallestIdx, idx);
        max_heapify(smallestIdx);
      } 
    }

    void min_heapify(int idx)  {
      while (idx > 0 && values[parent(idx)] > values[idx]) {
        heap_swap(parent(idx), idx);
        idx = parent(idx);
      }
    }

  public:
    priority_queue(int N) {
      keysSize = 0;
      keys.clear(); keys.reserve(N);
      values.clear(); values.reserve(N);
      keyToIdx.clear(); keyToIdx.reserve(N);
    }

    void push(int key, long long value) {
      // if (keyToIdx.count(key)) throw logic_error("key " + ::to_string(key) + " already exists");
      int idx = keysSize; ++keysSize;
      if (keysSize > keys.size()) {
        keys.push_back(key);
        values.push_back(value);
      } else {
        keys[idx] = key;
        values[idx] = value;
      }   
      keyToIdx[key] = idx;
      min_heapify(idx);      
    }

    void increase_key(int key, long long value) {
      // if (!keyToIdx.count(key)) throw logic_error("key " + ::to_string(key) + " does not exist");
      // if (values[keyToIdx[key]] > value) throw logic_error("value " + ::to_string(value) + " is not an increase");
      values[keyToIdx[key]] = value;
      max_heapify(keyToIdx[key]);
    }

    void decrease_key(int key, long long value) {
      // if (!keyToIdx.count(key)) throw logic_error("key " + ::to_string(key) + " does not exist");
      // if (values[keyToIdx[key]] < value) throw logic_error("value " + ::to_string(value) + " is not a decrease");
      values[keyToIdx[key]] = value;
      min_heapify(keyToIdx[key]);       
    }

    void pop() {
      if (keysSize > 0) {
        heap_swap(0, --keysSize);
        keyToIdx.erase(keys[keysSize]);       
        if (keysSize > 0) max_heapify(0);
      } else {
        throw logic_error("priority queue is empty");
      }
    }

    int top() {
      if (keysSize > 0) {
        return keys.front();
      } else {
        throw logic_error("priority queue is empty");
      }
    }    

    int size() {
      return keysSize;
    }

    bool empty() {
      return keysSize == 0;
    }

    int at(int key) {
      return values[keyToIdx.at(key)];
    }

    int count(int key) {
      return keyToIdx.count(key);
    }

    string to_string() {
      ostringstream out;
      copy(keys.begin(), keys.begin() + keysSize, ostream_iterator<int>(out, " "));
      out << '\n';
      copy(values.begin(), values.begin() + keysSize, ostream_iterator<int>(out, " "));
      return out.str();
    }    
  };
}

Solution

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int N, M, s, t; // number of nodes, edges, source, and target
  cin >> N >> M >> s >> t;
  --s; --t;                     // 0 indexing
  vector<tuple<int, int, int>> edges;
  vector<unordered_map<int, pair<int, int>>> edgeList(N);
  vector<unordered_map<int, pair<int, int>>> reverseEdgeList(N);
  for (int m = 0; m < M; ++m) { // read in edges
    int a, b, l;
    cin >> a >> b >> l;
    --a; --b;
    edges.emplace_back(a, b, l);
    if (!edgeList[a].count(b) || edgeList[a][b].first > l) {
      edgeList[a][b] = make_pair(l, 1);
      reverseEdgeList[b][a] = make_pair(l, 1);
    } else if (edgeList[a][b].first == l) {
      ++edgeList[a][b].second;
      ++reverseEdgeList[b][a].second;
    }
  } 
  vector<pair<long long, unordered_set<int>>> distanceFromSource = computeShortestPath(s, edgeList);
  vector<pair<long long, unordered_set<int>>> distanceFromTarget = computeShortestPath(t, reverseEdgeList);
  vector<tuple<int, int, int>> pathCounts = countPaths(s, distanceFromSource, distanceFromTarget);
  vector<tuple<int, int, int>> backPathCounts = countPaths(t, distanceFromTarget, distanceFromSource);
  for (int i = 0; i < N; ++i) pathCounts[i] = multiplyPathTuples(pathCounts[i], backPathCounts[i]);

  long long shortestDistance = distanceFromSource[t].first;
  tuple<int, int, int> shortestPaths = pathCounts[s];
  for (tuple<int, int, int> edge : edges) {
    long long pathDistance = distanceFromSource[get<0>(edge)].first + get<2>(edge) + distanceFromTarget[get<1>(edge)].first;
    if (pathDistance == shortestDistance && // path is shortest
        pathCounts[get<0>(edge)] == shortestPaths && // every path goes through from node
        pathCounts[get<1>(edge)] == shortestPaths && // every path goes through to node
        distanceFromSource[get<1>(edge)].second.size() == 1 && // only paths come from the from node
        edgeList[get<0>(edge)][get<1>(edge)].second == 1) {
      cout << "YES" << '\n';
    } else if (pathDistance - get<2>(edge) + 1 < shortestDistance) {
      cout << "CAN" << ' ' << (pathDistance - shortestDistance) + 1 << '\n';
    } else {
      // which will happen if the city is unconnected since the edge won't be big enough to overcome the big distance
      cout << "NO" << '\n';
    }
  }
  cout << flush;
  return 0;
}

Photo URL is broken

One of my greatest achievements at Duke was getting an A+ in COMPSCI 100E (I think it is 201, nowadays?). It's a bit sad how little I have to be proud of, but my only other A+ came from an intro physics course unless you count study abroad. I'd also like to make a shoutout to my fellow math major Robert Won, who took COMPSCI 100E with me.

I did all the extra credit and every single Algorithmic Problem-solving Testing (APT) for full credit. I remember the hardest one being APT CountPaths. I remember neglecting all my other coursework and spending several days on this problem, which apparently isn't suppose to be that hard. I recently encountered a variation of this problem again, here, so I figured that I would spend some time writing up a solution. My original solution that I wrote back at Duke is a whopping 257 lines of Java. My current solution is less than 100 lines in C++, which includes comments.

Beginner Version

In the most basic version of the problem, we're trying to get from one corner to the other corner.

A B

Going from point A to B, only down and right moves are permitted. If the grid has $R$ rows and $C$ columns. We need to make $R-1$ down moves and $C-1$ right moves. The order doesn't matter, and any down move is indistinguishable from another down move. Thus, the number of paths is $$\boxed{\frac{(R + C - 2)!}{(R-1)!(C-1)!} = {{R + C - 2}\choose{R-1}}}.$$

Intermediate Version

In another version, there are some squares that we are not allowed to visit, and the grid is not too big, say less than a billion squares.

A B x 1 x 3 x 2 x 4

Here, we are not allowed to visit black squares, $x_1$, $x_2$, $x_3$, and $x_4$. This version admits a dynamic programming solution that is $O(RC)$ in time complexity and $O\left(\max(R,C)\right)$ in memory since we only need to keep track of two rows or columns at a time. Let $P(r, c)$ be the number of paths to square $(r,c)$. We have that $$ P(r,c) = \begin{cases} 1, &\text{if $r = 0$ and $c=0$} \\ 0, &\text{if $(r,c)$ is a black square} \\ P(r-1,c) + P(r,c-1), &\text{if $0 \leq r < R$ and $0 \leq c < C$ and $(r,c) \neq (0,0)$} \\ 0, &\text{otherwise}. \end{cases} $$

We simply loop through rows and columns in order with our loop invariant being that at square $(r,c)$, we know the number of paths to squares $(r^\prime, c^\prime)$, where either $r^\prime < r$ or $r^\prime = r$ and $c^\prime < c$. See the code in C++ below with some variations since the problem asks for number of paths modulo 1,000,000,007.

#include <algorithm>
#include <iostream>
#include <vector>

using namespace std;

const int MOD = 1000000007;

int main(int argc, char *argv[]) {
  int h, w, n;
  cin >> h >> w >> n;  
  vector<vector<int>> grid(2, vector<int>(w, 0));
  vector<pair<int, int>> blackSquares;
  for (int i = 0; i < n; ++i) {
    int r, c;
    cin >> r >> c;
    blackSquares.emplace_back(r - 1, c - 1);
  }
  sort(blackSquares.begin(), blackSquares.end());
  grid.front()[0] = 1;               // guaranteed that 0,0 is white
  auto blackIt = blackSquares.begin();
  for (int c = 1; c < w; ++c) {
    if (blackIt != blackSquares.end() && blackIt -> first == 0 && blackIt -> second == c) {
      grid.front()[c] = -1;
      ++blackIt;
    } else if (grid.front()[c-1] != -1) {      
      grid.front()[c] = 1;
    }
  }
  for (int r = 1; r < h; ++r) {
    if (blackIt != blackSquares.end() && blackIt -> first == r && blackIt -> second == 0) {
      grid.back()[0] = -1;
      ++blackIt;
    } else if (grid.front()[0] != -1) {
      grid.back()[0] = 1;
    } else {
      grid.back()[0] = 0;
    }
    for (int c = 1; c < w; ++c) {
      if (blackIt != blackSquares.end() && blackIt -> first == r && blackIt -> second == c) {
        grid.back()[c] = -1;
        ++blackIt;
      } else {
        grid.back()[c] = 0;
        if (grid.front()[c] != -1) grid.back()[c] += grid.front()[c];
        if (grid.back()[c-1] != -1) grid.back()[c] += grid.back()[c-1];
        grid.back()[c] %= MOD;
      }
    }
    grid.front().swap(grid.back());
  }
  cout << grid.front()[w-1] << endl;
  return 0;
}

Advanced Version

Now, if we have a very large grid, the above algorithm will take too long. If $1 \leq R,C \leq 10^5$, we could have as many as 10 billion squares. Assume that number of black squares $N \ll RC$ ($N$ much less than $RC$). Then, we can write an algorithm that runs in $O\left(N^2\log\left(\max(R,C)\right)\right)$.

First, sort the black squares. Given $x = (r_1, c_1)$ and $y = (r_2, c_2)$, let $x < y$ if either $r_1 < r_2$ or $r_1 = r_2$ and $c_1 < c_2$. Let $x_1,x_2,\ldots,x_N$ be the black squares, where $x_i = (r_i,c_i)$. For each $x_i$, associate the set $$S_i = \left\{x_j = (r_j, y_j) : r_j \leq r_i~\text{and}~c_j \leq c_i~\text{and}~x_j \neq x_i\right\}.$$

Recalling our previous diagram,

A B x 1 x 3 x 2 x 4

$S_1 = \emptyset$, $S_2 = S_3 = \{ x_1 \}$, $S_4 = \{x_1,x_3\}$, and $S_B = \{x_1, x_2, x_3, x_4\}$. Let let us loop through these points in order, and calculate the number of paths to $x_i$ avoiding all the squares in $S_i$. It's clear that for all $x_j \in S_i$, $j < i$, so we have our loop invariant: at $i$, for all $j < i$, we know the number of paths to $x_j$ that do not pass through any square in $S_j$. Assuming this, let us calculate the number of paths to $x_i$.

If $S_i$ is empty, the problem reduces to the beginner version and $$P(x_i) = P(r_i,c_i) = {{r_i + c_i - 2}\choose{r_i-1}}.$$ If $S_i$ is not empty, we need to subtract any paths that go through any $x_j \in S_i$, so if $i = 4$, we want to subtract paths like these red ones:

A B x 1 x 3 x 2 x 4

Thus, there are 3 types of paths we need to subtract. Paths that go through $x_1$ and not $x_3$, paths that go through $x_3$ and not $x_1$, and paths that go through both $x_1$ and $x_3$.

It might be helpful to think about this as a Venn diagram:

x 1 x 3

We see that the set of these paths is the same as any path that goes through $x_1$ and paths that go through $x_3$ but not $x_1$. The number of paths that go through $x_1$ is $$P(x_1)\cdot{(r_4 - r_1) + (c_4 - c_1)\choose{r_4 - r_1}},$$ and since $x_1 \in S_3$, the number of paths that go through $x_3$ but not $x_1$ is $$P(x_3)\cdot{(r_4 - r_3) + (c_4 - c_3)\choose{r_4 - r_3}}.$$

Generalizing this, we have that $$P(x_i) = {{r_i + c_i - 2}\choose{r_i-1}} - \sum_{x_j \in S_i}P(x_j)\cdot{{(r_i-r_j) + (c_i - c_j)}\choose{r_i-r_j}},$$ which leads to the $N^2$ factor in the time complexity. Treating B as a black square, we can calculate the number of paths to B that avoid $x_1, x_2,\ldots,x_N$.

Now, the problem is further complicated by the fact that we need to return the number of paths modulo 1,000,000,007. Computing the binomial coefficient involves division, so we need to be able to find the modular multiplicative inverse. We can do this with the Euclidean algorithm, which gives us the $\log\left(\max(R, C)\right)$ factor in the time complexity.

Since $m = 10^9 + 7$ is prime, given any $n < m$, $\gcd(n,m) = 1$, so there exists $w,x \in \mathbb{Z}$ such that $wn + xm = 1$. We can find $w$ and $x$ by setting up a linear system, $$ \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} n \\ m \end{pmatrix} = \begin{pmatrix} n \\ m \end{pmatrix}, $$ which can be rewritten as an augmented matrix, $$ \left(\begin{array}{cc|c} 1 & 0 & n\\ 0 & 1 & m \end{array}\right). $$ You can perform the Euclidean algorithm by doing elementary row operations until a $1$ appears in the rightmost column. The first and second column of that row is $w$ and $x$, respectively. Since the system remains consistent with $(n,m)^\intercal$ as a solution, we have $$wn + xm = 1 \Rightarrow wn = 1 - xm \equiv 1 \bmod m,$$ so the modular multiplicative inverse of $n$ is $w$.

If you memoize this step and precalculate all the modular multiplicative inverses up to $\max(R,C)$, you can actually achieve a $O(N^2)$ algorithm, but this is unnecessary for this problem. Here's the code below.

#include <algorithm>
#include <iostream>
#include <vector>

using namespace std;

const long long MOD = 1000000007LL;
const int FACTORIALS_LENGTH = 199999;
long long factorials[FACTORIALS_LENGTH];

long long invertMod(int n, int m) {
  /*
   * find the modular inverse with
   * extended euclidean algorithm
   * [w x][a] = [a] 
   * [y z][b] = [b]
   */
  n %= m;
  long w = 1;
  long x = 0;
  long y = 0;
  long z = 1;
  long a = n;
  long b = m;
  while (a != 0 && b != 0)  {
    if (a <= b) {
      int q = b / a;
      int r = b % a;
      b = r;
      z -= q*x; z %= m;
      y -= q*w; y %= m;
    } else if (a > b) {
      int q = a / b;
      int r = a % b;
      a = r;
      x -= q*z; x %= m;
      w -= q*y; w %= m;
    }
  }
  int inverseMod = a != 0 ? w : y;
  if (inverseMod < 0) inverseMod += m;
  return inverseMod;
}

long long choose(int n, int r) {
  long long res = (factorials[n]*invertMod(factorials[r], MOD)) % MOD;
  res *= invertMod(factorials[n-r], MOD);
  return res % MOD;
}

int main(int argc, char *argv[]) {
  factorials[0] = 1;
  for (int n = 1; n < FACTORIALS_LENGTH; ++n) factorials[n] = (factorials[n-1]*n) % MOD;
  int h, w, n;
  cin >> h >> w >> n;  
  vector<pair<int, int>> blackSquares;
  vector<long long> blackSquarePaths;
  for (int i = 0; i < n; ++i) {
    int r, c;
    cin >> r >> c;
    blackSquares.emplace_back(r - 1, c - 1);
    blackSquarePaths.push_back(0);
  }
  // just treat the end point as a black square
  blackSquares.emplace_back(h - 1, w - 1);
  blackSquarePaths.push_back(0);
  sort(blackSquares.begin(), blackSquares.end());
  for (int i = 0; i < blackSquares.size(); ++i) {
    // start by considering all paths
    blackSquarePaths[i] = choose(blackSquares[i].first + blackSquares[i].second, blackSquares[i].first);
    for (int j = 0; j < i; ++j) {
      // subtract unallowed paths, we already know row is smaller because we sorted already
      if (blackSquares[j].second <= blackSquares[i].second) {
        /*
         *  A
         *   B
         *    C
         * if we're trying to get to C, we need to subtract all the paths that go through, A, B, or both
         * we hit A first and subtract paths that go through A and both A and B
         * we hit B and subtract all paths that go through B but not A
         *
         * loop invariant is that we know how many paths to get to a 
         * black square by avoiding other black squares
         */
        int rDiff = blackSquares[i].first - blackSquares[j].first;
        int cDiff = blackSquares[i].second - blackSquares[j].second;
        blackSquarePaths[i] -= (blackSquarePaths[j]*choose(rDiff + cDiff, rDiff)) % MOD;
        blackSquarePaths[i] %= MOD;
        if (blackSquarePaths[i] < 0) blackSquarePaths[i] += MOD;
      }
    }
  }
  cout << blackSquarePaths.back() << endl;
  return 0;
}

Duke APT CountPath Version

Now the version that I did at Duke has a few key differences. First, you start in the lower left and end up in the upper right. More importantly, you need to calculate the paths that go through 1 black square, 2 black squares, 3 black squares, etc. We need to do this up to paths that go through all $N$ black squares. Also, the grid is much smaller, and the black squares must be visited in the order given.

Since the grid is much smaller, a dynamic programming solution works where the state is four variables $(r,c,k,l)$, where $k$ is the index of last black square visited and $l$ is the number of black squares on that path. In general, if $T$ is the set of indicies of black squares that are to the lower-left of $k$ and appear earlier in the order given, we have that $$ P(r,c,k,l) = \begin{cases} P(r-1,c,k,l) + P(r,c-1,k,l), &\text{if $(r,c)$ is not a black square} \\ \sum_{k^\prime \in T} P(r-1,c,k^\prime,l-1) + P(r,c-1,k^\prime,l-1), &\text{if $(r,c)$ is a black square.} \end{cases} $$ This is $O(RCN^2)$ in both time and space. Here is the Java code.

import java.util.Arrays;

public class CountPaths {

    private static final int MOD = 1000007;

    public int[] difPaths(int r, int c, int[] fieldrow, int[] fieldcol) { 
        // row, col, last special field, num of previous special fields,
        // need to add 1 to third index, in case there are 0 special fields
        int[][][][] memo = new int[r+1][c+1][fieldrow.length+1][fieldrow.length+1]; 

        int[][] isSpecial = new int[r+1][c+1];
        for (int i = 0; i <= r; ++i) { Arrays.fill(isSpecial[i], -1); };
        for (int i = 0; i < fieldrow.length; ++i) { isSpecial[fieldrow[i]][fieldcol[i]] = i; }
        // 00 is special, marks no special fields yet
        memo[1][0][0][0] = 1;   // so that memo[1][1] gets 1 path
        for (int i = 1; i <= r; ++i) {
            for (int j = 1; j <= c; ++j) {                
                int kLimit; int lStart;
                // handle 00 case where no special fields have been reached yet
                if (isSpecial[i][j] == -1) {
                    // l = 0 case, no special field came before it
                    memo[i][j][0][0] += memo[i-1][j][0][0] + memo[i][j-1][0][0];
                    memo[i][j][0][0] %= MOD;
                    kLimit = fieldrow.length; lStart = 1;
                } else {
                    // since this field is special min length is 1
                    memo[i][j][isSpecial[i][j]][1] += memo[i-1][j][0][0] + memo[i][j-1][0][0];
                    memo[i][j][isSpecial[i][j]][1] %= MOD;
                    kLimit = isSpecial[i][j]; lStart = 2;
                }                
                for (int l = lStart; l <= fieldrow.length; ++l) {
                    // only check special fields that come before
                    for (int k = 0; k < kLimit; ++k) {
                        // this point is only reachable from special field if it is up and to the right
                        if (i >= fieldrow[k] && j >= fieldcol[k]) {                        
                            if (isSpecial[i][j] == -1) {
                                // when the field is not special, length is the same
                                memo[i][j][k][l] += memo[i-1][j][k][l] + memo[i][j-1][k][l];
                                memo[i][j][k][l] %= MOD;
                            } else if (isSpecial[i][j] > -1) {
                                // when the field is special add 1 to the length
                                memo[i][j][kLimit][l] += memo[i-1][j][k][l-1] + memo[i][j-1][k][l-1];
                                memo[i][j][kLimit][l] %= MOD;
                            }
                        }
                    }
                }
            }            
        }                
        int[] paths = new int[fieldrow.length+1];
        for (int k = 0; k < fieldrow.length; ++k) {            
            for (int l = 1; l < fieldrow.length + 1; ++l) {
                paths[l] += memo[r][c][k][l];
                paths[l] %= MOD;
            }
        }
        paths[0] += memo[r][c][0][0]; paths[0] %= MOD;
        return paths;
    }
}

Using ideas from the advanced version, we can significantly reduce the memory needed. Basically, we represent each black square as a node and construct a directed graph, represented by an adjacency matrix, where the edges are weighted by the number of paths from one black square to another that avoid all other black squares. This takes $O(N^4)$ time and $O(N^2)$ space. Now, let the nodes be numbered $1,2,\ldots,N$. Now, we can calculate the paths that go through any number of black squares with some dynamic programming. Let $G(j,k)$ be the number of paths from black square $j$ and black square $k$ that don't go through any other black squares. Let $Q(k, l)$ be the number paths of length $l$ where $k$ is the last black square. We have that $$Q(k,l) = \sum_{j < k} Q(j, l-1)\cdot G(j,k), $$ keeping in mind that the nodes are sorted, and $G(j,k) = 0$ if the black squares are in the wrong order. Here's the C++ code below.

#include <algorithm>
#include <cassert>
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;

class CountPaths { 
private:
  const int MOD = 1000007;
  vector<vector<int>> choose;
  vector<vector<int>> initializeChoose(int N) {
    choose = vector<vector<int>>();
    choose.push_back(vector<int>{1}); // choose[0]
    for (int n = 1; n < N; ++n) {
      choose.push_back(vector<int>(n+1));
      choose[n][n] = choose[n][0] = 1;
      for (int r = 1; r < n; ++r) {
        // out of the previous n - 1 things choose r things
        // and out of the previous n - 1 things choose r - 1 things and choose the nth thing
        choose[n][r] = choose[n-1][r] + choose[n-1][r-1];
        choose[n][r] %= MOD;
      }
    }
    return choose;
  }

  int computePaths(int R, int C, vector<pair<int, int>> specialFields) {
    // specialFields should already be sorted in this case
    specialFields.emplace_back(R, C);
    int N = specialFields.size();
    vector<int> paths = vector<int>(N);
    for (int i = 0; i < N; ++i) {
      paths[i] = choose[specialFields[i].first + specialFields[i].second][specialFields[i].first];
      for (int j = 0; j < i; ++j) {
        int rDiff = specialFields[i].first - specialFields[j].first; 
        int cDiff = specialFields[i].second - specialFields[j].second; 
        assert(rDiff >= 0);
        if (cDiff >= 0) {
          paths[i] -= (((long long) paths[j])*choose[rDiff + cDiff][rDiff]) % MOD;
          paths[i] %= MOD;
          if (paths[i] < 0) paths[i] += MOD;
        }
      }
    }
    return paths.back();
  }

  vector<tuple<int, int, int>> makeSpecialFields(int r, int c, vector<int> fieldrow, vector<int> fieldcol) {
    vector<tuple<int, int, int>> specialFields;
    bool originIsSpecial = false;
    bool endIsSpecial = false;
    int N = fieldrow.size();
    for (int i = 0; i < N; ++i) {
      if (fieldrow[i] == 1 && fieldcol[i] == 1) originIsSpecial = true;
      if (fieldrow[i] == r && fieldcol[i] == c) endIsSpecial = true;
      specialFields.emplace_back(fieldrow[i] - 1, fieldcol[i] - 1, i);
    }
    if (!originIsSpecial) specialFields.emplace_back(0, 0, -1);
    if (!endIsSpecial) specialFields.emplace_back(r - 1, c - 1, N);
    sort(specialFields.begin(), specialFields.end(),
         [](tuple<int, int, int> a, tuple<int, int, int> b) { 
           int ra = get<0>(a);
           int rb = get<0>(b);
           int ca = get<1>(a);
           int cb = get<1>(b);
           return ra < rb || (ra == rb && ca < cb) || (ra == rb && ca == cb && get<2>(a) < get<2>(b));
         });
    return specialFields;
  }

public: 
  vector<int> difPaths(int r, int c, vector<int> fieldrow, vector<int> fieldcol) { 
    initializeChoose(r + c - 1);    
    // make a directed graph of paths between without going through any other fields
    // tuple is (row, col, idx)
    vector<tuple<int, int, int>> specialFields = makeSpecialFields(r, c, fieldrow, fieldcol);
    int N = specialFields.size();
    // graph[i][j] is the number of paths going from i to j without going through any other special fields
    vector<vector<int>> graph(N, vector<int>(N, 0));
    for (int i = 0; i < N - 1; ++i) {
      for (int j = i + 1; j < N; ++j) {
        int rDiff = get<0>(specialFields[j]) - get<0>(specialFields[i]);
        int cDiff = get<1>(specialFields[j]) - get<1>(specialFields[i]);
        vector<pair<int, int>> intermediateFields;
        assert(rDiff >= 0);     // since fields are sorted
        if (cDiff >= 0 && get<2>(specialFields[i]) < get<2>(specialFields[j])) {
          for (int k = i + 1; k < j; ++k) {
            assert(get<0>(specialFields[i]) <= get<0>(specialFields[k]) && get<0>(specialFields[k]) <= get<0>(specialFields[j])); // since fields are sorted
            if (get<1>(specialFields[i]) <= get<1>(specialFields[k]) && get<1>(specialFields[k]) <= get<1>(specialFields[j])) {
              intermediateFields.emplace_back(get<0>(specialFields[k]) - get<0>(specialFields[i]), 
                                              get<1>(specialFields[k]) - get<1>(specialFields[i]));
            }
          }
          graph[i][j] = computePaths(rDiff, cDiff, intermediateFields);
        }
      }
    }  
    // first index refers to last special index (index in specialFields)
    // second index refers to length of path
    vector<vector<int>> pathsMemo(N, vector<int>(N + 1, 0));
    pathsMemo[0][1] = 1; // counting the origin as a special field, we have 1 path going through the origin that goes through 1 special field (the origin)
    for (int l = 2; l <= N; ++l) {
      for (int k = 0; k < N; ++k) {
        for (int j = 0; j < k; ++j) {
          pathsMemo[k][l] += ((long long) pathsMemo[j][l-1]*graph[j][k]) % MOD;
          pathsMemo[k][l] %= MOD;
        }
      }      
    }
    return vector<int>(pathsMemo[N-1].end() - fieldrow.size() - 1, pathsMemo[N-1].end());
  } 
};

A classic programming problem is to find all the primes up to a certain number. This problem admits a classic solution, the sieve of Eratosthenes. Here it is in Java.

/**
 * @param upper bound exclusive
 * @return a list of primes strictly less than upper
 */
public static Deque<Integer> findPrimes(int upper) {
    Deque<Integer> primes = new ArrayDeque<Integer>();
    if (upper <= 2) return primes;
    primes.add(2);
    boolean[] isPrime = new boolean[(upper-2)/2]; // index 0 is 3
    Arrays.fill(isPrime, true);
    for (int p = 3; p < upper; p += 2) {
        if (isPrime[p/2 - 1]) {
            primes.add(p);
            // only need to start from p^2 since we already checked p*m, where m < p
            for (long q = ((long) p)*((long) p); q < upper; q += 2*p) {
                isPrime[((int) q)/2 - 1] = false;
            }
        }            
    }
    return primes;
}

The problem is with the isPrime array. This solution is $O(n)$ in space and computation. We may only be interested in finding large primes from say 999,900,000 to 1,000,000,000 as in this problem PRIME1. It doesn't make sense to check numbers less than 999,900,000 or allocate space for them.

Hence, we use a segmented sieve. The underlying idea is that to check if a number $P$ is prime by trial division, we only need to check that it is not divisible by any prime numbers $q \leq \sqrt{P}$. Thus, if we want to find all the primes between $m$ and $n$, we first generate all the primes that are less than or equal to $\sqrt{n}$ with the traditional sieve. Let $S$ be the set of those primes.

Then, let $L$ be some constant number. We work in segments $[m, m + L)$, $[m + L, m + 2L)$, $\ldots$, $[m + kL, n + 1)$. In each of these segments, we identify of all the multiples of the primes found in $S$ and mark them as not prime. Now, we only need $O(\max(|S|, L))$ space, and computation is $$O\left(|S| \cdot \frac{n-m}{L} + (n-m)\right),$$ and we can set $L$ to be as large or small as we want.

By the prime number theorem, $|S|$ is not typically very large. Asympototically, $$|S| = \pi(\sqrt{n}) \sim \frac{\sqrt{n}}{\log \sqrt{n}}.$$ For $L$, we have a tradeoff. If we have large $L$, we may need a lot of space. If we have $L$ too small, our sieve is very small and may not contain many multiples of the primes in $S$, which results in wasted computation. Here is the code with some tweaks to avoid even numbers.

/**
 * Find primes in range
 * @param lower bound, inclusive
 * @param upper bound exclusive
 * @param sieveSize space to use
 * @return list of primes in range
 */
public static Deque<Integer> findPrimes(int lower, int upper, int sieveSize) {
    if (lower >= upper) throw new IllegalArgumentException("lower must be less than upper");
    int sievingPrimesUpper = (int) Math.sqrt(upper);
    if (lower <= sievingPrimesUpper || sievingPrimesUpper <= 2) {
        Deque<Integer> primes = findPrimes(upper);
        if (!primes.isEmpty()) while (primes.peekFirst() < lower) primes.removeFirst();
        return primes;
    }                
    if (sieveSize < 5) sieveSize = 10;
    Deque<Integer> primes = new ArrayDeque<Integer>();        
    if (lower <= 2) primes.add(2);
    Deque<Integer> sievingPrimes = findPrimes(sievingPrimesUpper + 1);
    sievingPrimes.removeFirst(); // get rid of 2
    while (!sievingPrimes.isEmpty() &&
           sievingPrimes.getLast()*sievingPrimes.getLast() >= upper) sievingPrimes.removeLast();
    if (lower % 2 == 0) lower += 1; // make lower odd
    boolean[] isPrime = new boolean[sieveSize]; // isPrime[i] refers to lower + 2*i
    /** 
     * Find first odd multiple for each sieving prime. lower + 2*nextMultipleOffset[i]
     * will be the first odd multiple of sievingPrimes[i] that is greater than or
     * equal to lower.
     */
    int[] nextMultipleOffset = new int[sievingPrimes.size()];
    int idx = 0;
    for (int p : sievingPrimes) {
        int nextMultiple = lower - (lower % p); // make it a multiple of p
        if (nextMultiple < lower)  nextMultiple += p; // make it at least lower
        if (nextMultiple % 2 == 0) nextMultiple += p; // make sure it's odd
        nextMultipleOffset[idx++] = (nextMultiple - lower)/2;
    }
    while (lower < upper) {
        Arrays.fill(isPrime, true);
        idx = 0;
        for (int p : sievingPrimes) {
            int offset = nextMultipleOffset[idx];
            for (int j = offset; j < sieveSize; j += p) isPrime[j] = false;
            /** 
             * We want (lower + 2*sieveSize + 2*(nextMultipleOffset[idx] + k)) % p == 0
             * and (lower + 2*sieveSize + 2*(nextMultipleOffset[idx] + k)) % 2 == 1,
             * where k is the correction term. Second equation is always true. 
             * First reduces to 2*(sieveSize + k) % p == 0 ==> (sieveSize + k) % p == 0
             * since 2 must be invertible in the field F_p. Thus, we have that
             * k % p = (-sieveSize) % p. Then, we make sure that the offset is nonnegative.
             */
            nextMultipleOffset[idx] = (nextMultipleOffset[idx] - sieveSize) % p;
            if (nextMultipleOffset[idx] < 0) nextMultipleOffset[idx] += p;
            ++idx;
        }   
        for (int i = 0; i < sieveSize; ++i) {
            int newPrime = lower + i*2;
            if (newPrime >= upper) break;
            if (isPrime[i]) primes.add(newPrime);
        }
        lower += 2*sieveSize;
    }
    return primes;
}