SCC Gabow Style
Fists full of stacks

Gabow's SCC Algorithm
Haskell Implementation

After reworking topSort, the only function left that still used Data.Graph was scc. My first plan was to build on the depth-first style searches I had written with time counters and implement the standard double-dfs CLSR algorithm. Based on this comparison, I instead opted to try for Gabow's path based approach.

First I'll describe how the algorithm works and show an example scheme implementation. Then I'll discuss the haskell implementation for alga which augments that algorithm slightly to describe edge classification and build an explicit condensation graph.

Gabow's SCC Algorithm

The goal is to take a graph \(G\) and compute its condensation. The condensation of a graph is another graph whose vertices are the mutually reachable vertices of \(G\), and whose edges are inherited from \(G\) (except would-be self loops, which are not). Gabow's path-based SCC algorithm does this by doing a preorder traversal of the graph, maintaining two stacks of vertices. The boundary stack is used to find representative vertices of components. The path stack is used to keep track of which vertices belong to these components.

Like many graph algorithms, the path-based approach can be described by what happens when entering a vertex, following its out-edges, and exiting it. To fulfill alga's desired type, getting an explicit condensation, it's also necessary classify the edges in the process.

Description

The plan of attack is to keep track of which vertices have been visited, when they were, and which component they belong to. We discover the components by pushing vertices to stacks when visiting and inspecting stacks when leaving. So, the stateful ingredients are:

enter time/preorder id counter
component counter
path stack
boundary stack
table mapping \(v \mapsto pre[v]\)
table mapping \(v \mapsto scc[v]\)
table of edges, which says if \(u\rightarrow v\) is an edge between components in the condensation, or an edge between vertices inside a component (inside a vertex of condensation).

The core idea is that (back) edges that point from vertices with higher preorder times to lower preorder times induce cycles and cycles induce components. The boundary stack is popped until the vertex not yet assigned to a component with the lowest reachable preorder id is on top. Representative vertices are discovered by looking at the boundary stack at exit. If the vertex being exited is found to also be at the top of the boundary stack, it represents the strongly connected component containing all the path stack up to itself. This process partitions the vertices according to strongly connected components.

Here is the state data type for the haskell implementation. The scheme version just mutates local variables.

Enter

When a vertex \(u\) is entered, the entry time is recorded, it added to the path stack and to the boundary stack. A piece of information this gives us is: vertices have been visited if and only if they have been assigned a pre-order id. See enter vertex.

Out Edges

For each out edge \(u \rightarrow v\), the state of \(v\) guides how to act:

\(v\) hasn't been visited (has no pre-order id). Enter \(v\) – this is a depth first traversal, after all!
\(v\) has been entered and has been assigned to a component. \(u\) can't belong to that component, so the edge \(u \rightarrow v\) is between vertices in the condensation. The stacks aren't modified.
\(v\) has a pre-order id, but no component id. It must be the case that \(pre[v] < pre[u]\) and that \(u,v\) belong to the same component. The boundary stack is popped until the top vertex \(b\) has \(pre[b]\le pre[v]\). The effect of this last step is clearer after explaining exit.

Exit

After exploring all branches out of \(u\), the stacks indicate if \(u\) represents a component or not. If \(u\) is at the top of the boundary stack, none of its descendents pointed to a vertex with an earlier pre-order id. So, \(u\) represents some component containing all the vertices on the paths stack up to and including itself. Those vertices are removed from the path stack and assigned to a component accordining to the component counter. See exit vertex.

Example

Here is a dodgy diagram I made that traces the haskell implementation. The graph (in alga notation) is \(3*1*4*1*5\).

Scheme Implementation

Here is a scheme implementation I wrote while working out the haskell one. It doesn't explicitly build a condensation, only a table mapping vertices to component ids. I find it easier to follow.

(define scc
  (lambda (G)
    (let ((preorder 0)
          (component 0)
          (preorders (make-hash-table))
          (components (make-hash-table))
          (boundary '())
          (path '()))
      (define (preorder-id v)
        (hashtable-ref preorders v #f))
      (define (component-id v)
        (hashtable-ref components v #f))
      (define (set-component! v)
        (hashtable-set! components v component))
      (define (set-preorder! v)
        (hashtable-set! preorders v preorder)
        (inc! preorder))
      (define (enter v)
        (set-preorder! v)
        (push! v boundary)
        (push! v path))
      (define (pop-boundary v)
        (let ((p-v (preorder-id v)))
          (let walk ((x (pop! boundary)))
            (if (< p-v (preorder-id x))
                (walk (pop! boundary))
                (push! x boundary)))))
      (define (exit v)
        (when (= v (car boundary))
          (pop! boundary)
          (set-component! v)
          (let walk ((x (pop! path)))
            (unless (= x v)
              (set-component! x)
              (walk (pop! path))))
          (inc! component)))
      (define (dfs u)
        (enter u)
        (for-all (lambda (v)
                   (if (preorder-id v)
                       (unless (component-id v)
                         (pop-boundary v))
                       (dfs v)))
                 (adjacent u G))
        (exit u))
      (for-all (lambda (v)
                 (unless (preorder-id v)
                   (dfs v)))
               (vertex-list G))
      components)))

Haskell Implementation

scc :: Ord a => AdjacencyMap a -> AcyclicAdjacencyMap (NonEmptyAdjacencyMap a)
scc = ...

This type is aspirational. While the outer part of the condensation graph is indeed acyclic, for now version 0.5 of alga returns a plain adjacency map. This will likely adapt in the future. Here's the actual type

scc :: Ord a => AdjacencyMap a -> AdjacencyMap (NonEmptyAdjacencyMap a)
scc = ...

Edge classification

In the same way that the verticies are assigned to componnents, so too can edges be classified based on vertex pre-order times. The search state is augmented by including a stack of edges analogous to the path stack. On finding a vertex \(u\) which represents some SCC, edges \(v \rightarrow w\) in this edge stack are in the newly found component/vertex of the condensation exactly when minimum preorder id is greater or equal to that of \(u\), ie \[\min(pre[v],pre[w])\ge pre[u].\] The rest belong to another component. Here's the description for following out edges from the other page adapted to the edge classification point of view.

Following Out Edges

For each out edge \(u \rightarrow v\), the state of \(v\) guides how to classify:

\(v\) hasn't been visited (has no pre-order id). dfs from \(v\). Edge \(u \rightarrow v\) is an inner edge if and only if \(v\) isn't a representative vertex. In other words, \(v\) will be at the top of the boundary stack when it's exited exactly when none of its descendents pointed to an ancestor. If a new component was found exiting v, \(u \rightarrow v\) is an edge between vertices of the condensation. Otherwise, push it on the edge stack.
\(v\) has been entered and has been assigned to a component. \(u\) can't belong to that component, so the edge \(u \rightarrow v\) is between vertices in the condensation.
\(v\) has a pre-order id, but no component id. \(u \rightarrow v\) is an edge inside a vertex of the condensation graph.

Condensation

Before getting to the nitty gritty the final step is the condensing. As components or vertices of the condensation are discovered (see the horrible insertComponent function below), they are constructed and put into (yet) a(nother!) stack, \(innerGraphs\). The first draft of the implmentation used an IntMap, but this trick avoids more logs (err, yeah IntMap lookup is still pretty much log, no matter what they say)

The rest is ceremony to "coerce" the components to type NonEmpty and overlay the vertices and edges of the condensation.

condense :: Ord a => AdjacencyMap a -> StateSCC a
         -> AdjacencyMap (NonEmpty.AdjacencyMap a)
condense g (SCC _ n _ _ _ assignment inner _ outer)
  | n == 1 = vertex $ convert g
  | otherwise = gmap (\c -> inner' Array.! (n-1-c)) outer'
  where inner' = Array.listArray (0,n-1) (convert <$> inner)
        outer' = es `overlay` vs
        vs = vertices [0..n-1]
        es = edges [ (sccid x, sccid y) | (x,y) <- outer ]
        sccid v = assignment Map.! v
        convert = fromJust . NonEmpty.toNonEmpty

Haskell Data Type

Here is the outer call and the hairy state data type. Since haskell map lookups cost a log, we store vertices along with their pre order id in the boundary stack to avoid that.

scc :: Ord a => AdjacencyMap a -> AdjacencyMap (NonEmpty.AdjacencyMap a)
scc g = condense g $ execState (gabowSCC g) initialState where
  initialState = SCC 0 0 [] [] Map.empty Map.empty [] [] []

data StateSCC a
  = SCC { preorder      :: {-# unpack #-} !Int
        , component     :: {-# unpack #-} !Int
        , boundaryStack :: [(Int,a)]
        , pathStack     :: [a]
        , preorders     :: Map.Map a Int
        , components    :: Map.Map a Int
        , innerGraphs   :: [AdjacencyMap a]
        , innerEdges    :: [(Int,(a,a))]
        , outerEdges    :: [(a,a)] }

Nitty Gritty

Simplified haskell implementation

That's all the set up. Now to describe the it in the flesh. This version is simplified in that it ignores edge classification.

gabowSCC :: Ord a => AdjacencyMap a -> State (StateSCC a) ()
gabowSCC g =
  do let dfs u = do  -- give u preorder id and push to stacks
                    enter u
                    forM_ (adjacent u) $ \v -> do -- for v in out edges
                      preorderId v >>= \case
                        -- if v not visited dfs v
                        Nothing  -> dfs v
                        Just p_v -> do
                           -- check if v has not been assigned to scc
                          scc_v <- hasComponent v
                          -- if not, contract stacks, putting vertex
                          -- with lowest possible pre-id on top
                          unless scc_v $ popBoundary p_v
                    -- maybe discover a new scc, maybe not.
                    exit u
     -- usual dfs of graph
     forM_ (vertexList g) $ \v -> do
       assigned <- hasPreorderId v
       -- vertex has been traversed iff it has (no) PreorderId
       unless assigned $ dfs v

Now for the helper functions, the meat outside the dfs. Note that the edge classification bits are included in these.

Enter Vertex

Assign pre order, push to stacks. Since lookups for maps in haskell costs a \(\log\), we store vertices along with their pre order id in the boundary stack to avoid doing that.

enter v = do C pre scc bnd pth pres sccs gs es_i es_o <- get
             let pre' = pre+1 -- new enter time counter
                 bnd' = (pre,v):bnd -- push to boundary stack
                 pth' = v:pth -- push to path stack
                 pres' = Map.insert v pre pres -- update preorder/time table
             put $ C pre' scc bnd' pth' pres' sccs gs es_i es_o
             return pre

Contract Boundary Stack
\(popBoundary\) is be called when considering edges \(u \rightarrow v\) if \(v\) hasn't been assigned to a component and \(pre[v] < pre[u]\). Vertices \(w\) with preorder \(pre[w] > pre[v]\) are discarded and \(v\) may or may not come to represent the enclosing component.
```
popBoundary p_v = modify'
  (\(C pre scc bnd pth pres sccs gs es_i es_o) ->
     C pre scc (dropWhile ((>p_v).fst) bnd) pth pres sccs gs es_i es_o)
```

Exit Vertex

When a new compoent is found, this updates inner edges, outer edges, vertices of condensation graph, path stacks, scc id counter.

exit v = do newComponent <- (v==).snd.head <$> gets boundary
            when newComponent $ insertComponent v
            return newComponent

insertComponent v = modify'
  (\(SCC pre -- preorder id/entry time counter
         scc -- scc id counter
         bnd -- boundary stack
         pth -- path stack
         pres -- preorder table
         sccs -- scc id table
         gs -- vertices of condensation, indexed by scc id
         es_i -- inner edge path stack, popped section to be put in gs table 
         es_o -- outer edges. to be used after traversal to condense graph
   ) -> let -- these vertices form the newly completed SCC
            (curr,v_pth') = span (/=v) pth
            -- pop vertices in completed component up to and including v
            pth' = tail v_pth' -- Here we know that v_pth' starts with v
            -- contract inide edge stack based on preorder id
            (es,es_i') = span ((>=p_v).fst) es_i
            -- new subgraph/condensation vertex
            g_i | null es = vertex v
                | otherwise = edges (snd <$> es)
            -- lowest time/preorder id in new scc is top of stack
            p_v = fst $ head bnd 
            scc' = scc + 1 -- new SCC id
            bnd' = tail bnd  -- pop boundary stack
            -- give vertices up to v in path stack a new SCC id
            sccs' = List.foldl' (\sccs x -> Map.insert x scc sccs) 
                                sccs
                                (v:curr)
            gs' = g_i:gs -- insert subgraph in condensation table
         in SCC pre scc' bnd' pth' pres sccs' gs' es_i' es_o)

Gabow with edge classification

The full uncensored version, including edge classification:

gabowSCC :: AdjacencyIntMap -> State StateSCC ()
gabowSCC g =
  do let dfs u = do -- grab current time for help with edge classification
                    p_u <- enter u
                    forM_ (adjacent u) $ \v -> -- for out edges (u,v):
                      preorderId v >>= \case
                        Nothing  -> do -- v not visited
                          updated <- dfs v
                          if updated -- if v formed new SCC
                            then outedge (u,v) -- then: (u,v) is outside edge
                            else inedge (p_u,(u,v)) -- else: (u,v) is inside edge
                        Just p_v -> do -- v visited:
                          scc_v <- hasComponent v
                          if scc_v -- if v already assigned component
                            then outedge (u,v) -- then: (u,v) outside edge
                            -- else: update boundary and put edge
                            -- (u,v) in edge path stack analogue where
                            -- some inside edges go in next found scc,
                            -- but possibly not all
                            else popBoundary p_v >> inedge (p_v,(u,v))
                    exit u
     forM_ (vertexList g) $ \v -> do
       assigned <- hasPreorderId v
       unless assigned $ void $ dfs v

SCC Gabow Style Fists full of stacks

Table of Contents