Posts tagged dynamic programming

Recently, I came across two problems that required the convex hull trick. Often, we have a set of lines, and given a time $t$, we want to find out which line has the maximal (or minimal value). To make this more concrete, here's the first problem in which I came across this technique, Cyclist Race.

Chef has been invited to be a judge for a cycling race.

You may think about cyclists in the race as points on a line. All the cyclists start at the same point with an initial speed of zero. All of them move in the same direction. Some cyclists may force their cycles to speed up. The change of speed takes place immediately. The rest of the time, they move with a constant speed. There are N cyclists in total, numbered from 1 to N.

In order to award points to the participants, Chef wants to know how far the cyclists have reached at certain points of time. Corresponding to the above two, there will be two types of queries.

Change the speed of cyclist i at some time.

Ask for the position of the race leader at some time.

Return the results of the second type of query.

Now, it's not too hard to see that the distance traveled by each cyclist is a piecewise linear function. For each query of the second type, we could just evaluate all these peicewise linear functions and take the max, but there's a better way.

In this particular problem, the speed is always increasing. So if you look at the thick lines in the title picture that indicate which cyclist is in the lead, it forms the bottom of a convex hull, hence the name, the convex hull trick. The distance of the lead cyclist is also piecewise linear, so the goal becomes to merge the piecewise linear functions of all the cyclist into one.

Essentially, we'll want to create a vector $\left((x_0, l_0), (x_1,l_1),\ldots,(x_{n-1},l_{n-1})\right)$, where $l_i$ is a line, and $x_i$ is the $x$-coordinate at which the line becomes relavant. That is, line $l_i$ has the maximal value on the interval $\left[x_i, x_{i+1}\right]$, where $x_n = \infty$. In this way, if we sort our vector by $x_i$, then to find the maximal value at $x$, we can do a binary search.

To build this vector, we to first have all our lines sorted in ascending order by slope. We iterate through our lines, but we only want to keep some of them. Let us call our convex hull $C$. Initialize $C = \left((-\infty, l_0)\right)$. Now, suppose we encouter line $l$ and we need to make a decision. Recall that the slope $l$ will need be greater than or equal to any line line in $C$. There are several cases.

If $C$ has at least two lines, consider the previous line $l^\prime$ and the line before $l^\prime$, $l^{\prime\prime}$. $l^\prime$ becomes relevant at the intersection with $l^{\prime\prime}$ at say $x^\prime$. If $l$ intersects $l^{\prime\prime}$ at $x$ and $x \leq x^\prime$ $l$ becomes relevant before $l^\prime$, so we remove $l^\prime$. Repeat until we remove as many unnecessary lines as possible. Then, if append to our vector $(x,l)$ where $x$ is the $x$-coordinate of the intersection of $l$ and $l^\prime$.
If $C$ has only one line and $l$ has the same slope as that line, then only keep the line with the bigger $y$-intercept.
If $C$ has only one line and $l$ has a larger slope, then append to our vector $(x,l)$ where $x$ is the $x$-coordinate of the $l$ and the one line in $C$.

See the title picture for an example. We'd initialize with the green line $\left(-\infty, y = 0.1x\right)$. Now, we're in case 3, so we'd add the blue line $(0, y = 0.2x)$ since the two lines intersect at $(0,0)$. For the thick red line, we find ourselves in case 1. We'll pop off the blue line and find ourselves in case 3 again, so now our vector is $C = \left((-\infty, y= 0.1x),(0,y=x/3)\right)$.

The next line we encounter is the green line $y = 0.5x - 0.8$. We're in case 1, but we can't pop off any lines since its intersection with other two lines is far to the right, so we simply append this line and its intersection with the thick red line, so we have $$C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.8,y=0.5x-0.8)\right).$$ Our next line is thick blue line, $y = 2x/3 - 1.4$. This intersects the thick red line at $x = 4.2$, while the thick pervious green line intersects at $x =4.8$, so we'll pop off the green line $y = 0.5x-0.5$, and push the thick blue line on to the stack and get $C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.2,y=2x/3-1.4)\right)$. Finally, we encounter the thick green line and our convex hull is $$ C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.2,y=2x/3-1.4),(5.4, y=2x-8.6)\right). $$

Hopefully, it's clear that the natural data structure to keep track of the lines is a vector, which we'll use as a stack. Then, adding new lines is just a constant time operation. The most time-consuming operation is the initial sorting of the lines, so constructing the convex hull is $O(N\log N)$, where $N$ is the number of lines. Evaluating the distance at some $x$ is $O(\log N)$ using binary search as we discussed earlier. If we have $M$, queries the total running time will be $O\left((N + M)\log N \right)$.

Here's the code for this problem.

#include <algorithm>
#include <iostream>
#include <utility>
#include <vector>

using namespace std;

class Line {
private:
  long double m, b; // y = mx + b
public:
  Line(long double m, long double b) : m(m), b(b) {}
  long double& slope() { return m; }
  long double& yIntercept() { return b; }  
  pair<long double, long double> intersect(Line other) {
    long double x = (this -> b - other.yIntercept())/(other.slope() - this -> m);
    long double y = (this -> m)*x + this -> b;
    return make_pair(x, y);
  }
  double getY(long double x) { return m*x + b; }
};

vector<pair<long double, Line>> makeConvexHull(vector<Line> &lines) {
  int N = lines.size(); 
  vector<pair<long double, Line>> convexHull; convexHull.reserve(N);
  if (N == 0) return convexHull;
  convexHull.emplace_back(0, lines.front());
  for (int i = 1; i < N; ++i) {
    // pop stack while new line's interesection with second to last line is to the left of last line
    // note that slopes are strictly increasing
    while (convexHull.size() > 1 &&
           lines[i].intersect(convexHull[convexHull.size() - 2].second).first <= convexHull.back().first) {
      convexHull.pop_back();
    }
    // check intersection with x = 0 when there's only 1 line
    if (convexHull.size() == 1 && lines[i].yIntercept() >= convexHull.back().first) convexHull.pop_back();
    convexHull.emplace_back(lines[i].intersect(convexHull.back().second).first, lines[i]);
  }
  return convexHull;
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int N, Q; cin >> N >> Q;      // number of cyclists and queries
  // current speed, time change, distance traveled so far
  vector<Line> lines; // y = mx + b, first is m, second is b
  lines.emplace_back(0,0);
  vector<pair<int, long long>> cyclists(N, make_pair(0, 0)); // first is speed v, second is x, where x + vt is location at time t
  vector<long long> queryTimes;
  for (int q = 0; q < Q; ++q) {
    int queryType; cin >> queryType;
    long long time;
    switch(queryType) {
    case 1:                     // speed change
      int cyclist, newSpeed;
      cin >> time >> cyclist >> newSpeed;
      --cyclist;   // convert to 0 indexing
      // suppose current function is x + vt
      // new function is x + v*time + (t - time)*newSpeed
      // (x + v*time - time*newSpeed) + newSpeed*t
      // = (x + (v-newSpeed)*time) + newSpeed*t
      cyclists[cyclist].second += time*(cyclists[cyclist].first - newSpeed);
      cyclists[cyclist].first = newSpeed;      
      lines.emplace_back(newSpeed, cyclists[cyclist].second);
      break;
    case 2:                     // leader position
      cin >> time;
      queryTimes.push_back(time);
      break;
    }
  }
  // sort slopes in ascending order
  sort(lines.begin(), lines.end(),
       [](Line &a, Line &b) -> bool {
         if (a.slope() != b.slope()) return a.slope() < b.slope();
         return a.yIntercept() < b.yIntercept();
       }); 
  if (lines.size() == 1) {      // there will always be at least 1 line
    for (long long t : queryTimes) {
      cout << (long long) lines.back().getY(t) << '\n';
    }
  } else {
    // eliminate irrelevant lines, first is time where the line becomes relevant
    vector<pair<long double, Line>> convexHull = makeConvexHull(lines); 
    // since times are strictly increasing just walk through lines 1 by 1
    int cursor = 0;
    for (long long t : queryTimes) {
      while (cursor + 1 < convexHull.size() && convexHull[cursor + 1].first <= t) ++cursor;
      cout << (long long) convexHull[cursor].second.getY(t) << '\n';
    }
  }
  cout << flush;
  return 0;
}

In this particular problem, negative times make no sense, so we can start at $x = 0$ for our convex hull.

Applications to Dynamic Programming

In certain cases, we can apply this trick to a dynamic programming problem to significantly improve the running time. Consider the problem Levels and Regions.

Radewoosh is playing a computer game. There are $N$ levels, numbered $1$ through $N$. Levels are divided into $K$ regions (groups). Each region contains some positive number of consecutive levels.

The game repeats the the following process:

If all regions are beaten then the game ends immediately. Otherwise, the system finds the first region with at least one non-beaten level. Let $X$ denote this region.

The system creates an empty bag for tokens. Each token will represent one level and there may be many tokens representing the same level.

For each already beaten level $i$ in the region $X$, the system adds $t_i$ tokens to the bag (tokens representing the $i$-th level).

Let $j$ denote the first non-beaten level in the region $X$. The system adds $t_j$ tokens to the bag.

Finally, the system takes a uniformly random token from the bag and a player starts the level represented by the token. A player spends one hour and beats the level, even if he has already beaten it in the past.

Given $N$, $K$ and values $t_1,t_2,\ldots,t_N$, your task is to split levels into regions. Each level must belong to exactly one region, and each region must contain non-empty consecutive set of levels. What is the minimum possible expected number of hours required to finish the game?

It's not obvious at all how the convex hull trick should apply here. Well, at least it wasn't obvious to me. Let's do some math first. Consider the case where we only have 1 region first consisting of levels $1,2,\ldots,n$. Let $T_i$ be the time at which we finish level $i$. Write \begin{equation} T_n = T_1 + (T_2 - T_1) + (T_3 - T_2) + \cdots + (T_n - T_{n-1}). \label{eqn:expectation_Tn} \end{equation}

$T_1 = 1$ since we'll just put $t_i$ tokens in the bag and draw from those $t_i$ tokens, so $t_i/t_i = 1$, so we always play and beat the first level immediately. Now $T_{i} - T_{i-1}$ is the time that it takes to be level $i$ given that levels $1,2,\ldots,i-1$ are beaten. This has a geometric distribution. Every time we try to beat level $i$, we'll put in $t_1 + t_2 + \cdots + t_i$ tokens in the back and try to get one of the $t_i$ tokens labeled $i$. The probability of doing so is \begin{equation} p = \frac{t_i}{\sum_{j=1}^i t_j}. \end{equation} Thus, we'll have that \begin{align} \mathbb{E}(T_i - T_{i-1}) &= p + 2(1-p)p + 3(1-p)p + \cdots = \sum_{k=1}^\infty k(1-p)^{k-1}p \nonumber\\ &= \frac{p}{\left(1-(1-p)\right)^2} = \frac{1}{p} \nonumber\\ &= \frac{\sum_{j=1}^i t_j}{t_i}. \label{eqn:expectation_delta_T} \end{align}

Now by linearity of expectation, applying Equation \ref{eqn:expectation_delta_T} to Equation \ref{eqn:expectation_Tn} will give us that \begin{equation} \mathbb{E}T_n = \sum_{i = 1}^n\frac{\sum_{j=1}^i t_j}{t_i}. \label{eqn:expection_T_n} \end{equation}

Now, define $E_{k,n}$ denote the minimum expected time to beat $n$ levels if we split them into $k$ regions. Note that each region must have at least 1 level, so this is only well-defined if $n \geq k$. Now, the levels must be beaten in order, so imagine putting levels $t_l,t_{l+1},\ldots,t_n$ into the last region. Since each region must have at least 1 level, we'll have that $k \leq l \leq n$, which gives us the recurrence relation \begin{equation} E_{k,n} = \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{\sum_{j=l}^i t_j}{t_i}\right\} \label{eqn:expectation_recurrence} \end{equation} by Equation \ref{eqn:expection_T_n}.

Now, suppose we define \begin{equation} S_k = \sum_{i=1}^k t_i ~\text{and}~ R_k = \sum_{i=1}^k \frac{1}{t_i}, \label{eqn:sum_dp} \end{equation} which leads to \begin{equation} E_{1,n} = \mathbb{E}T_n = \sum_{i=1}^n\frac{S_i}{t_i} \label{eqn:base_expectation} \end{equation} Equation \ref{eqn:expectation_recurrence} becomes \begin{align} E_{k,n} &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{\sum_{j=l}^i t_j}{t_i}\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{S_i - S_{l-1}}{t_i}\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{S_i}{t_i} - S_{l-1}\left(R_n - R_{l-1}\right)\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \left(E_{1,n} - E_{1,l-1}\right) - S_{l-1}\left(R_n - R_{l-1}\right)\right\} \nonumber\\ &= E_{1,n} + \inf_{k \leq l \leq n}\left\{\left(E_{k - 1, l - 1} - E_{1,l-1} + S_{l-1}R_{l-1}\right) - S_{l-1}R_n\right\} \label{eqn:expectation_line} \end{align} by Equations \ref{eqn:base_expectation} and \ref{eqn:sum_dp}.

Do you see the lines of decreasing slope in Equation \ref{eqn:expectation_line}, yet? It's okay if you don't. Look at the expression inside the $\inf$. The $y$-intercept is in parentheses and the slope is $-S_{l-1}$ which is decreasing with $l$. So index our lines by $l$.

Fix $k$. Imagine that we have calculated $E_{j,n}$ for all $j < k$. Now, we're going to attempt to calculate $E_{k,k},E_{k,k+1},\ldots, E_{k,N}$ in that order. To calculate $E_{k,n}$ we only need to consider the lines $l = k,\ldots,n$. The intercept does not vary as a function of $n$, so we can add lines one-by-one. Before calculating $E_k,n$, we'll add the line with slope $-S_{n-1}$. Now our answer will be $E_{K,N}$.

In this way, it simplifies a $O(KN^2)$ problem into a $O(KN\log N)$ problem. Notice that here, instead of building our convex hull before making any queries like in the previous problem, we dynamically update it. Here's the code with some differences since $0$-indexing was used in the code.

#include <algorithm>
#include <iomanip>
#include <iostream>
#include <utility>
#include <vector>

using namespace std;

class Line {
private:
  long double m, b; // y = mx + b
public:
  Line(long double m, long double b) : m(m), b(b) {}
  long double slope() { return m; }
  long double& yIntercept() { return b; }  
  pair<long double, long double> intersect(Line other) {
    long double x = (this -> b - other.yIntercept())/(other.slope() - this -> m);
    long double y = (this -> m)*x + this -> b;
    return make_pair(x, y);
  }
  double getY(long double x) { return m*x + b; }
};

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int N, K; cin >> N >> K; // number of levels and regions
  vector<int> T; T.reserve(N);  // tokens
  for (int i = 0; i < N; ++i) {
    int t; cin >> t;
    T.push_back(t);
  }
  vector<long long> S; S.reserve(N); // cumulative sum of tokens
  vector<long double> R; R.reserve(N); // cumulative sum of token reciprocals
  S.push_back(T.front());
  R.push_back(1.0/T.front());
  for (int n = 1; n < N; ++n) {
    S.push_back(S.back() + T[n]);
    R.push_back(R.back() + 1.0L/T[n]);
  }
  /* let eV[k][n] be the minimum expected time to beat
   * levels 0,1,...,n-1 spread across regions 0,1,...,k-1
   */ 
  vector<vector<long double>> eV(K, vector<long double>(N)); // only indices eV[k][n] with n >= k are valid
  eV.front().front() = 1;
  for (int n = 1; n < N; ++n) { // time to beat each level is a geometric distribution
    eV[0][n] = eV[0][n-1] + ((long double) S[n])/T[n];
  }
  /* eV[k][n] = min(eV[k-1][l-1] + (S[l]-S[l-1])/t_l + ... + (S[n] - S[l-1])/t_{n}),
   * where we vary k <= l < n. That is, we're putting everything with index at least l
   * in the last region. Note that each region needs at least 1 level. This simplifes to
   * eV[k][n] = min(eV[k-1][l-1] + eV[0][n] - eV[0][l-1]  - S[l-1](R[n] - R[l-1])
   *          = eV[0][n] + min{(eV[k-1][l-1] - eV[0][l-1] + S[l-1]R[l-1]) - S[l-1]R[n]},
   * Thus, for each l we have a line with slope -S[l - 1].
   * -S[l-1] is strictly decreasing and R[n] is strictly increasing.
   * Use the convex hull trick.
   */ 
  vector<pair<long double, Line>> convexHull; convexHull.reserve(N);
  for (int k = 1; k < K; ++k) {    
    /* in convex hull we add lines in order of decreasing slope, 
     * the the convexHull[i].first is the x-coordinate where 
     * convexHull[i].second and convexHull[i-1].second intersect
     */
    convexHull.clear();
    int cursor = 0;
    for (int n = k; n < N; ++n) { // calculate eV[k][n]
      /* add lines l = k,...,n to build convex hull
       * loop invariant is that lines l = k,...,n-1 have been added, so just add line l = n
       */
      long double slope = -S[n - 1];
      long double yIntercept = eV[k - 1][n - 1] - eV[0][n - 1] + S[n - 1]*R[n - 1];
      Line l(slope, yIntercept);
      while (convexHull.size() > 1 && 
             convexHull.back().first >= l.intersect(convexHull[convexHull.size() - 2].second).first) {
        convexHull.pop_back(); // get rid of irrelevant lines by checking that line intersection is to the left
      }
      // check intesection with x = 0 if no lines left
      if (convexHull.size() == 1 && l.yIntercept() <= convexHull.back().second.yIntercept()) convexHull.pop_back();
      convexHull.emplace_back(convexHull.empty() ? 0 : l.intersect(convexHull.back().second).first, l); // add line
      /* binary search for the correct line since they are sorted by x intersections
       * lower bound is old cursor since x coordinate of intersection is monotonically decreasing
       * and R[n] is increasing, too
       */ 
      if (cursor >= convexHull.size()) cursor = convexHull.size() - 1;
      cursor = upper_bound(convexHull.begin() + cursor, convexHull.end(), R[n], 
                           [](long double x, pair<long double, Line> l) { return x < l.first; }) - convexHull.begin();
      --cursor; // since we actually want the last index that is less than or equal to
      eV[k][n] = eV[0][n] + convexHull[cursor].second.getY(R[n]);

    }
  }
  cout << fixed;
  cout << setprecision(10);
  cout << eV.back().back() << endl;
  return 0;
}

By the way the drawing was done with GeoGebra. It's a pretty cool way to do math.

Tags: math, algorithm, dynamic programming, codechef, codeforces, probability

About Me

Posts tagged dynamic programming

Convex Hull Trick

Applications to Dynamic Programming