Posts tagged hackerrank
Recently, I came across a programming interview question that I didn't know how to answer. It turns out the solution was to use a suffix array. I had actually been aware of the existence of the data structure for some time, but never bothered to learn about them because they always made my head hurt.
They're actually pretty simple to describe. Consider a string $S$ of length $N.$ Then, $S$ will have $N$ suffixes, $S_0,S_1,\ldots,S_{N-1}$, where if we use $0$-based indexing, $S_i$ is just the suffix of $S$ starting at the $i$th character. Thus, we can just represent each suffix its index which, is just an integer. Sort the suffixes alphabetically. A suffix array is sorted array of suffixes of $S$. Strings can be very long, for example, a DNA sequence. Thus, we represent our suffix array as the array of integers that corresponds to each suffix.
The concept is very simple, but I always got lost on the construction. Of course, we could do a comparison sort, and so, given $N$ suffixes, we have about $O(N\log N)$ comparisons. The problem is that for each comparison, we have to compare $O(N)$ characters, so this naive problem is actually $O(N^2\log N)$ which is unacceptable for large $N.$
Suffix Array Construction
There exists $O(N)$ algorithms of constructing suffix arrays, but they are too unwieldly to code during a contest. The algorithm that I describe is $O\left(N(\log N)^2\right)$. The idea reminds me of a radix sort, where would sort a string character-by-character.
Here, we can do better than character-by-character, though, because a suffix of a suffix is just a further suffix. So, we sort by the first character. Now, say we have sorted all the suffixes by the first $k$ characters. We consider the suffix $i$ as the pair of suffixes $(i,i+k).$ The first $k$ characters of both of those suffixes are sorted, so actually we can easily sort by $2k$ characters by comparing pairs of integers. Thus, we at each step we have double the number of characters being compared, so we run a comparison sort $\lceil \log_2 N \rceil + 1$ times, which gives us a running time of complexity $O\left(N(\log N)^2\right)$.
In the title image, you can see this process in action. I thought it was pretty neat idea, so I created this data visualization, where you can construct your own suffix array from a string. Here's the an implementation in Java, where I use a custom comparator that keeps track of the rank of each suffix.
static class SuffixArray {
private class SuffixComparator implements Comparator<Integer> {
private String S;
private int N;
private int[] rankA;
private int[] rankB; // dummy array to avoid constantly reallocating memory
private boolean useRankB = false;
private int sortedCharacters = 0;
public SuffixComparator(String S) {
this.S = S;
this.N = S.length();
this.rankA = new int[N];
this.rankB = new int[N];
}
public int compare(Integer a, Integer b) {
int[] rank = useRankB ? rankB : rankA;
if (rank[a] != rank[b]) return rank[a] - rank[b];
if (sortedCharacters == 0) { // just look at first letter
return S.charAt(a) - S.charAt(b);
} else {
// first sortedCharacters are equal
// substring after sortedCharacters is
// S.substring(a + sortedCharacters) and S.substring(b + sortedCharacters)
// these are just further suffixes
if (a + sortedCharacters == N) {
return -1; // suffix a is only sortedCharacters long
} else if (b + sortedCharacters == N) {
return 1; // suffix b is only sortedCharacters long
} else { // look at sub-suffixes
return rank[a + sortedCharacters] - rank[b + sortedCharacters];
}
}
}
public int getRank(int suffixIndex) {
return useRankB ? rankB[suffixIndex] : rankA[suffixIndex]; // return position in suffixArray
}
public void updateRank(Integer[] suffixArray) {
int[] newRank = useRankB ? rankA : rankB;
newRank[suffixArray[0]] = 0;
for (int i = 1; i < N; ++i) { // loop over position in suffix array
if (compare(suffixArray[i], suffixArray[i - 1]) > 0) {
newRank[suffixArray[i]] = newRank[suffixArray[i - 1]] + 1;
} else {
newRank[suffixArray[i]] = newRank[suffixArray[i - 1]];
}
}
toggleRank();
}
public int incrementSortedCharacters() {
if (sortedCharacters >= N) return sortedCharacters;
if (sortedCharacters == 0) {
++sortedCharacters;
} else {
sortedCharacters *= 2;
}
return sortedCharacters;
}
private boolean toggleRank() {
useRankB = !useRankB;
return useRankB;
}
}
private String S;
private Integer[] suffixArray; // suffixArray[i] is the ith suffix lexographically
private int N;
private int[] rank; // rank[suffixIndex] is the position in the suffixArray
public SuffixArray(String S) {
this.S = S;
this.N = S.length();
this.suffixArray = new Integer[N];
for (int i = 0; i < N; ++i) suffixArray[i] = i;
SuffixComparator suffixComparator = new SuffixComparator(S);
do { // each iteration sorts 2^i characters
Arrays.sort(suffixArray, suffixComparator);
suffixComparator.updateRank(suffixArray);
} while (suffixComparator.incrementSortedCharacters() < N);
this.rank = new int[N];
for (int i = 0; i < N; ++i) rank[i] = suffixComparator.getRank(i);
}
public int getSuffixIndex(int i) {
return suffixArray[i];
}
public String getSuffix(int i) {
return S.substring(suffixArray[i]);
}
public int getRank(int suffixIndex) {
return rank[suffixIndex];
}
public int size() {
return N;
}
public String getString() {
return this.S;
}
@Override
public String toString() {
StringBuilder stringBuilder = new StringBuilder();
if (N == 0) return "empty suffix array";
stringBuilder.append(getSuffix(0));
for (int i = 1; i < N; ++i) {
stringBuilder.append('\n');
stringBuilder.append(getSuffix(i));
}
return stringBuilder.toString();
}
}
LCP Arrays
For many applications of a suffix array, an auxiliary data structure called an LCP array is needed. LCP stands for longest common prefix. Let $SA_i$ be the $i$th suffix of the suffix array, so it is the suffix of rank $i$ when the suffixes are sorted alphabetically. The LCP array will also have length $N,$ with entries $LCP_i.$ $LCP_0$ is undefined, while for $i > 0,$ $LCP_i$ is the length of longest common prefix of $SA_i$ and $SA_{i-1}.$
Again, a naive computation would be $O(N^2),$ but we can use the fact that if we remove the first character from a suffix, we have a prefix of another suffix. So, start with the longest suffix, that is, $S$, itself. Compute the longest common prefix with the suffix that occurs before it in the suffix array. That is, suppose we know the longest common prefix of $SA_i$ and $SA_{i-1}$ has length $k > 0.$ Now, remove the first character from $SA_i$ to get $SA_j.$ The prefix of $SA_{j-1}$ will share at least $k - 1$ characters with $SA_j$ because in the very worst case $SA_{j-1}$ will be be $SA_{i-1}$ with the first character removed. We handle the case where $i = 0$ or $k = 0,$ specially. Here is the Java code.
static class LCPArray { // LCP stands for longest common prefix
private int N;
private String S;
private SuffixArray suffixArray;
private int[] lCP;
// lCP[i] is undefined for i = 0, i is position in suffixArray
// for i > 0, it contains the length of the longest common prefix
// of S.substring(suffixArray[i]) and S.substring(suffixArray[i-1])
public LCPArray(String S) {
this(new SuffixArray(S));
}
public LCPArray(SuffixArray suffixArray) {
this.suffixArray = suffixArray;
this.S = suffixArray.getString();
this.N = suffixArray.size();
this.lCP = new int[N];
if (N == 0) return;
lCP[0] = -1; // undefined for 0
if (N == 1) return;
int currentLCP = 0;
for (int suffixIndex = 0; suffixIndex < N; ++suffixIndex) {
int rank = suffixArray.getRank(suffixIndex);
if (rank == 0) continue;
int previousSuffixIndex = suffixArray.getSuffixIndex(suffixArray.getRank(suffixIndex) - 1);
// loop invariant is that the current suffix and previous suffix have longest common prefix
// of at least currentLCP
while (Math.max(suffixIndex, previousSuffixIndex) + currentLCP < N
&& S.charAt(suffixIndex + currentLCP) == S.charAt(previousSuffixIndex + currentLCP)) {
++currentLCP;
}
lCP[rank] = currentLCP;
// we only remove one character off the front each time, so suffix length decreases by at most 1
if (currentLCP > 0) --currentLCP;
}
}
public int size() {
return this.N;
}
public int get(int i) {
return this.lCP[i];
}
public SuffixArray getSuffixArray() {
return this.suffixArray;
}
}
Case Study
The problem that required me to learn these two data structures was String Similarity. Here's the problem statement.
For two strings $A$ and $B$, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings "abc" and "abd" is 2, while the similarity of strings "aaa" and "aaab" is 3.
Calculate the sum of similarities of a string $S$ with each of it's suffixes.
So, we find $S$ in our suffix array. Say $S = SA_i.$ The two suffixes with longest common prefixes of $S$ will be $SA_{i-1}$ and $SA_{i+1},$ which we've already calculated in $LCP_{i}$ and $LCP_{i+1},$ respectively.
The next two suffixes with longest common prefixes will be $LCP_{i-2}$ and $LCP_{i+2}.$ Now, we have to be careful because we could run into a situation like this. Consider $S = aabbabab.$ Here are the first 3 entries of the suffix array.
aabbabab
ab
abab
The longest common prefix of $S_0$ with $S_2$ is not $LCP_{2},$ but $\min\left(LCP_1,LCP_2\right),$ in general the longest common prefix of $SA_{i}$ with $SA_{j},$ where $i < j$ is $\min\left(LCP_{i+1},LCP_{i+2},\ldots,LCP_j\right).$ Thus, with our two data structures, the solution is quite short.
import java.io.*;
import java.util.*;
public class Solution {
static class SuffixArray {
...
}
static class LCPArray { // LCP stands for longest common prefix
...
}
static long sumSuffixSimilarities(String S) {
int N = S.length();
LCPArray lCPArray = new LCPArray(S);
SuffixArray suffixArray = lCPArray.getSuffixArray();
long similaritySum = N; // initialize with string itself
int pivot = suffixArray.getRank(0); // the longest matching prefixes will be adjacent
int minMatchLength = Integer.MAX_VALUE; // afterwards, the length will be decreasing
for (int i = pivot; i >= 0 && lCPArray.get(i) > 0; --i) {
minMatchLength = Math.min(lCPArray.get(i), minMatchLength);
similaritySum += minMatchLength;
}
minMatchLength = Integer.MAX_VALUE;
for (int i = pivot + 1; i < N && lCPArray.get(i) > 0; ++i) {
minMatchLength = Math.min(lCPArray.get(i), minMatchLength);
similaritySum += minMatchLength;
}
return similaritySum;
}
public static void main(String[] args) throws IOException {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(System.out)));
int T = Integer.parseInt(in.readLine());
for (int t = 0; t < T; ++t) {
String S = in.readLine();
out.println(sumSuffixSimilarities(S));
}
in.close();
out.close();
}
}
Some time ago, I was doing a problem on HackerRank that in introduced me to two new data structures that I want to write about. The problem is called Cross the River.
The premise is this:
You're standing on a shore of a river. You'd like to reach the opposite shore.
The river can be described with two straight lines on the Cartesian plane, describing the shores. The shore you're standing on is $Y=0$ and another one is $Y=H$.
There are some rocks in the river. Each rock is described with its coordinates and the number of points you'll gain in case you step on this rock.
You can choose the starting position arbitrarily on the first shore. Then, you will make jumps. More precisely, you can jump to the position $(X_2,Y_2)$ from the position $(X_1,Y_1)$ in case $\left|Y_2−Y_1\right| \leq dH$, $\left|X_2−X_1\right| \leq dW$ and $Y_2>Y_1$. You can jump only on the rocks and the shores.
What is the maximal sum of scores of all the used rocks you can obtain so that you cross the river, i.e. get to the opposite shore?
No two rocks share the same position, and it is guaranteed that there exists a way to cross the river.
Now, my first instinct was to use dynamic programming. If $Z_i$ is the point value of the rock, and $S_i$ is the max score at rock $i$, then $$ S_i = \begin{cases} Z_i + \max\{S_j : 1 \leq Y_i - Y_j \leq dH,~|X_i - X_j| \leq dW\} &\text{if rock is reachable} \\ -\infty~\text{otherwise,} \end{cases} $$ where we assume the existence of rocks with $Y$ coordinate $0$ of $0$ point value for all $X.$
Thus, we can sort the rocks by their $Y$ coordinate and visit them in order. However, we run into the problem that if $dW$ and $dH$ are large we may need to check a large number of rocks visited previously, so this approach is $O(N^2).$
My dynamic programming approach was the right idea, but it needs some improvements. Somehow, we need to speed up the process of looking through the previous rocks. To do this, we do two things:
- Implement a way to quickly find the max score in a range $[X-dW, X + dW]$
- Only store the scores of rocks in range $[Y-dH, Y)$
To accomplish these tasks, we use two specialized data structures.
Segment Trees
Segment trees solve the first problem. They provide a way to query a value (such as a maximum or minimum) over a range and update these values in $\log$ time. The key idea is to use a binary tree, where the nodes correspond to segments instead of indices.
For example suppose that we have $N$ indices $i = 0,1,\ldots, N-1$ with corresponding values $v_i.$ Let $k$ be the smallest integer such that $2^k \geq N.$ The root node of our binary tree will be the interval $[0,2^k).$ The first left child will be $[0,2^{k-1}),$ and the first right child will be $[2^{k-1},2^k).$ In general, we have for some node $[a,b)$ if $b - a > 1$, then the left child is $[a,(b-a)/2),$ and the right child is $[(b-a)/2,b).$ Otherwise, if $b - a = 1$, there are no children, and the node is a leaf. For example, if $5 \leq N \leq 8$, our segment tree looks like this.
In general, there are $2^0 + 2^1 + 2^2 + \cdots + 2^k = 2^{k+1} - 1$ nodes needed. $2N - 1 \leq 2^{k+1} - 1 \leq 2^2(N-1) - 1$, so the amount of memory needed is $O(N).$ Here's the code for constructing the tree.
class MaxSegmentTree {
private long[] maxes;
private int size;
public MaxSegmentTree(int size) {
int actualSize = 1;
while (actualSize < size) actualSize *= 2;
this.size = actualSize;
// if size is 2^k, we need 2^(k+1) - 1 nodes for all the intervals
maxes = new long[2*actualSize - 1];
Arrays.fill(maxes, Long.MIN_VALUE);
}
...
}
Now, for each node $[a,b),$ we store a value $\max(v_a,v_{a+1},\ldots,v_{b-1}).$ An update call consists of two parameters, an index $k$ and a new $v_k.$ We would traverse the binary tree until we reach the node $[k, k+1)$ and update that node. Then, we update the max of each ancestor by taking the max of its left and right child since the segment of child is always contained in the segment of the parent. In practice, this is done recursively like this.
class MaxSegmentTree {
...
public long set(int key, long value) {
return set(key, value, 0, 0, this.size);
}
/**
* @param node index of node since binary tree is implement with array
* @param l lower bound of segement (inclusive)
* @param r upper bound of segement (exclusive)
*/
private long set(int key, long value,
int node, int l, int r) {
// if not in range, do not set anything
if (key < l || key >= r) return maxes[node];
if (l + 1 == r) {
// return when you reach a leaf
maxes[node] = value;
return value;
}
int mid = l + (r-l)/2;
// left node
long left = set(key, value, 2*(node + 1) - 1, l, mid);
// right node
long right = set(key, value, 2*(node + 1), mid, r);
maxes[node] = Math.max(left, right);
return maxes[node];
}
...
}
A range max query takes two parameters: the lower bound of the range and the upper bound bound of the range in the form $[i,j).$ We obtain the max recursively. Let $[l,r)$ be the segment corresponding to a node. If $[l,r) \subseteq [i,j),$ we return the max associated with $[l,r)$. If $[l,r) \cap [i,j) = \emptyset,$ we ignore this node. Otherwise, $[l,r) \cap [i,j) \neq \emptyset,$ and $\exists k \in [l,r)$ such that $k \not\in [i,j),$ so $l < i < r$ or $l < j < r.$ In this case, we descend to the child nodes. The algorithm looks like this.
class MaxSegmentTree {
...
/**
* @param i from index, inclusive
* @param j to index, exclusive
* @return the max value in a segment.
*/
public long max(int i, int j) {
return max(i, j, 0, 0, this.size);
}
private long max(int i, int j, int node, int l, int r) {
// if in interval
if (i <= l && r <= j) return maxes[node];
// if completely outside interval
if (j <= l || i >= r ) return Long.MIN_VALUE;
int mid = l + (r-l)/2;
long left = max(i, j, 2*(node+1) - 1, l, mid);
long right = max(i, j, 2*(node+1), mid, r);
return Math.max(left, right);
}
...
}
I prove that this operation is $O(\log_2 N).$ To simplify things, let us assume that $N$ is a power of $2$, so $2^k = N.$ I claim that the worst case is $[i,j) = [1, 2^k - 1).$ Clearly this is true when $k = 2$ since we'll have to visit all the nodes but $[0,1)$ and $[3,4),$ so we visit $5 = 4k - 3 = 4\log_2 N - 3$ nodes.
Now, for our induction hypothesis we assume that the operation is $O(\log_2 N)$ for $1,2,\ldots, k - 1$. Then, for some $k$, we can assume that $i < 2^{k-1}$ and $j > 2^{k-1}$ since otherwise, we only descend one half of the tree, and it reduces to the $k - 1$ case. Now, given $[i, j)$ and some node $[l,r)$, we'll stop there if $[i,j) \cap [l,r) = \emptyset$ or $[l,r) \subseteq [i,j).$ Otherwise, we'll descend to the node's children. Now, we have assumed that $i < 2^{k-1} < j,$ so if we're on the left side of the tree, $j > r$ for all such nodes. We're not going to visit any nodes with $r \leq i,$ we'll stop at nodes with $l \geq i$ and compare their max, and we'll descend into nodes with $l < i < r$. At any given node on the left side, if $[l,r)$ is not a leaf and $l < i < r$, we'll choose to descend. Let the left child be $[l_l, r_l)$ and the right child be $[l_r,r_r)$. The two child segments are disjoint, so we will only choose to descend one of them since only one of $l_l < i < r_l$ or $l_r < i < r_r$ can be true. Since $l_l = l < i$, we'll stop only at the right child if $l_r = i.$ If $i$ is not odd, we'll stop before we reach a leaf. Thus, the worst case is when $i$ is odd.
On the right side, we reach a similar conclusion, where we stop when $r_l = j,$ and so the worst case is when $j$ is odd. To see this visually, here's an example of the query $[1,7)$ when $k = 3.$ Nodes where we visit the children are colored red. Nodes where we compare a max are colored green.
Thus, we'll descend at $2k - 1 = 2\log_2 N - 1$ nodes and compare maxes at $2(k-1) = 2(\log_2 N - 1)$ nodes, so $4\log_2 N - 3$ nodes are visited.
Max Queues
Now, the segment tree contains the max score at each $X$ coordinate, but we want to our segement tree to only contain values corresponding to rocks that are within range of our current position. If our current height is $Y$, we want rocks $j$ if $0 < Y - Y_j \leq dH.$
Recall that we visit the rocks in order of their $Y$ coordinate. Thus, for each $X$ coordinate we add the rock to some data structure when we visit it, and we remove it when it becomes out of range. Since rocks with smaller $Y$ coordinates become out of range first, this is a first in, first out (FIFO) situation, so we use a queue.
However, when removing a rock, we need to know when to update the segment tree. So, the queue needs to keep track of maxes. We can do this with two queues. The primary queue is a normal queue. The second queue will contain a monotone decreasing sequence. Upon adding to the queue, we maintain this invariant by removing all the smaller elements. In this way, the head of the queue will always contain the max element since it would have been removed otherwise. When we removing an element from the max queue, if the two heads are equal in value, we remove the head of each queue. Here is the code.
class MaxQueue<E extends Comparable<? super E>> extends ArrayDeque<E> {
private Queue<E> q; // queue of decreasing subsequence of elements (non-strict)
public MaxQueue() {
super();
q = new ArrayDeque<E>();
}
@Override
public void clear() {
q.clear();
super.clear();
}
@Override
public E poll() {
if (!super.isEmpty() && q.peek().equals(super.peek())) q.poll();
return super.poll();
}
@Override
public E remove() {
if (!super.isEmpty() && q.peek().equals(super.peek())) q.remove();
return super.remove();
}
@Override
public boolean add(E e) {
// remove all the smaller elements
while (!q.isEmpty() && q.peek().compareTo(e) < 0) q.poll();
q.add(e);
return super.add(e);
}
@Override
public boolean offer(E e) {
// remove all the smaller elements
while (!q.isEmpty() && q.peek().compareTo(e) < 0) q.poll();
q.offer(e);
return super.offer(e);
}
public E max() {
return q.element();
}
}
Solution
With these two data structures the solution is pretty short. We keep one segment tree that stores the current max at each $X$ coordinate. For each $X$, we keep a queue to keep track of all possible maxes. The one tricky part is to make sure that we look at all rocks at a certain height before updating the segment tree since lateral moves are not possible. Each rock is only added and removed from a queue once, and we can find the max in $\log$ time, so the running time is $O(N\log N)$, where $N$ is the number of rocks. Here's the code.
public class CrossTheRiver {
private static final int MAX_X = 100000;
...
public static void main(String[] args) throws IOException {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(System.out)));
StringTokenizer st = new StringTokenizer(in.readLine());
int N = Integer.parseInt(st.nextToken()); // rocks
int H = Integer.parseInt(st.nextToken()); // height
int dH = Integer.parseInt(st.nextToken()); // max y jump
int dW = Integer.parseInt(st.nextToken()); // max x jump
Rock[] rocks = new Rock[N];
for (int i = 0; i < N; ++i) { // read through rocks
st = new StringTokenizer(in.readLine());
int Y = Integer.parseInt(st.nextToken());
int X = Integer.parseInt(st.nextToken()); // 0 index
int Z = Integer.parseInt(st.nextToken());
rocks[i] = new Rock(X, Y, Z);
}
Arrays.sort(rocks);
long[] cumulativeScore = new long[N];
MaxSegmentTree sTree = new MaxSegmentTree(MAX_X + 1);
ArrayList<MaxQueue<Long>> maxX = new ArrayList<MaxQueue<Long>>(MAX_X + 1);
for (int i = 0; i <= MAX_X; ++i) maxX.add(new MaxQueue<Long>());
int i = 0; // current rock
int j = 0; // in range rocks
while (i < N) {
int currentY = rocks[i].y;
while (rocks[j].y < currentY - dH) {
// clear out rocks that are out of range
maxX.get(rocks[j].x).poll();
if (maxX.get(rocks[j].x).isEmpty()) {
sTree.set(rocks[j].x, Long.MIN_VALUE);
} else {
sTree.set(rocks[j].x, maxX.get(rocks[j].x).max());
}
++j;
}
while (i < N && rocks[i].y == currentY) {
// get previous max score from segment tree
long previousScore = sTree.max(rocks[i].x - dW, rocks[i].x + dW + 1);
if (rocks[i].y <= dH && previousScore < 0) previousScore = 0;
if (previousScore > Long.MIN_VALUE) { // make sure rock is reachable
cumulativeScore[i] = rocks[i].score + previousScore;
// keep max queue up to date
maxX.get(rocks[i].x).add(cumulativeScore[i]);
}
++i;
}
// now update segment tree
for (int k = i - 1; k >= 0 && rocks[k].y == currentY; --k) {
if (cumulativeScore[k] == maxX.get(rocks[k].x).max()) {
sTree.set(rocks[k].x, cumulativeScore[k]);
}
}
}
long maxScore = Long.MIN_VALUE;
for (i = N - 1; i >= 0 && H - rocks[i].y <= dH; --i) {
if (maxScore < cumulativeScore[i]) maxScore = cumulativeScore[i];
}
out.println(maxScore);
in.close();
out.close();
}
}