I was recently doing a problem that involved depth-first search to find components of a graph. For me, the most natural way to write this is recursion. It's pretty simple:

void floodFill(int vertex, int currentComponent, const vector<set<int>> &edgeList,
               vector<int> &component, vector<vector<int>> &componentSets) {
  component[vertex] = currentComponent;
  componentSets.back().push_back(vertex);
  for (int i : edgeList[vertex]) {
    if (component[i] == -1) {
      floodFill(i, currentComponent, edgeList, component, componentSets);         
    }
  }
}

vector<vector<int>> findComponents(const vector<set<int>> &edgeList) {
  int N = edgeList.size();
  vector<int> component(N, -1);  
  vector<vector<int>> componentSets;
  int currentComponent = 0;
  for (int i = 0; i < N; ++i) {
    if (component[i] == -1) {
      componentSets.emplace_back();
      floodFill(i, currentComponent++, edgeList, component, componentSets);
    }
  }  
  return componentSets;  
}

component and componentSets are shared state variables, so we pass them by reference. It seems simple, enough, right? Unfortunately, the stack size is very limited on OS X. Trying to increase the stack size, I get

$ ulimit -s 65536
-bash: ulimit: stack size: cannot modify limit: Operation not permitted

The most that I can increase the stack size to with ulimit is a mere 8192 KB, which only works on a graph with less than 15,000 nodes.

Now the easiest solution is to increase the stack size with gcc like this

$ g++ -Wl,-stack_size,0x4000000 -stdlib=libc++ -std=c++0x D.cpp

We specify the stack size in bytes, so 0x4000000 is $4 \times 16^6 = 67108864$, which is 64 MB.

I want to cover a better solution that works everywhere, though. We can use an explicit stack for recursion:

vector<vector<int>> findComponents(const vector<set<int>> &edgeList) {
  int N = edgeList.size();
  vector<int> component(N, -1);  
  vector<vector<int>> componentSets;
  int currentComponent = 0;
  for (int i = 0; i < N; ++i) {
    if (component[i] == -1) {
      componentSets.emplace_back();
      stack<int> s; 
      s.push(i);
      component[i] = currentComponent;
      componentSets.back().push_back(i);
      while (!s.empty()) {
        int vertex = s.top(); s.pop();
        for (int j : edgeList[vertex]) {
          if (component[j] == -1) {
            component[j] = currentComponent;
            componentSets.back().push_back(j);
            s.push(j);
          }
        }
      }
      ++currentComponent;
    }
  }  
  return componentSets;  
}

This solution is actually more memory efficient since the implicit stack in the recursion solution contains the function state, which includes 5 arguments and other local variables. According to the online judge, the explicit stack-based solution is only using 2/3 of the memory as the recursive solution.

However, the astute reader may note that we aren't quite doing the same thing. For example, in the explicit stack verion, we assign the component before we push the vertex on the stack, whereas in the recursive version, we assign the component after we retrieve the vertex from the stack. This is necessary to avoid adding a vertex to a stack twice. Consider the undirected graph with the edges $\{(0,1), (0,2), (1,2)\}$.

  1. First, we push $0$ onto the stack.
  2. We pop $0$ and check its edges, and push $1$ and $2$ onto the stack.
  3. Stacks are last in, first out (LIFO), so we pop $2$ off the stack. Now, we check the edges of $2$. That includes an edge to $1$. If we added $1$ to the stack without assigning it to a component, yet, we would push $1$ onto the stack again.

We're also visiting visiting the nodes in a different order. In the recursive solution, we would visit an unassigned vertex as soon as we see it, so we would have $0 \rightarrow 1 \rightarrow 2$, but in this explict stack version, we get $0 \rightarrow 2 \rightarrow 1$.

An explicit stack version that better simulates what the recursive solution does is

vector<vector<int>> findComponents(const vector<set<int>> &edgeList) {
  int N = edgeList.size();
  vector<int> component(N, -1);  
  vector<vector<int>> componentSets;
  int currentComponent = 0;
  int iterations = 0;
  for (int i = 0; i < N; ++i) {
    if (component[i] == -1) {
      componentSets.emplace_back();
      stack<tuple<int, set<int>::iterator>> s; 
      s.emplace(i, edgeList[i].begin());
      while (!s.empty()) {
        // important to assign by reference
        tuple<int, set<int>::iterator> &state = s.top(); 
        int vertex = get<0>(state);
        set<int>::iterator &edgeIt = get<1>(state);        
        if (component[vertex] == -1) { // first time visiting
          component[vertex] = currentComponent;
          componentSets.back().push_back(vertex);
        }
        // if there is nothing else, pop
        bool shouldPop = true;
        while (edgeIt != edgeList[vertex].end()) {
          int j = *edgeIt; ++edgeIt;
          if (component[j] == -1) {
            s.emplace(j, edgeList[j].begin()); 
            shouldPop = false;  // something new is on stack, do not pop it
            break; // immediately put it on the stack, break, and visit next vertex
          } 
        }      
        if (shouldPop) s.pop();
      }
      ++currentComponent;
    }
  }  
  return componentSets;  
}

Now, we assign the component after getting the vertex off the stack, and when we see an unassigned vertex, we immediately put it on top of the stack instead of adding all unassigned vertices first. This uses slightly more memory because now the stack has to keep track of which edges were visited, so the state includes set<int>::iterator. Despite this explicit solution being more similar to recursion, I'd be hesitant to recommend this code because of how convoluted it is. You need to be very careful with references. Also, in more complicated cases, your state tuple will contain more than two variables, requiring you to keep track of more things and making it more likely that you will make a mistake.


New Comment


Comments

No comments have been posted yet. You can be the first!