SCC Gabow Style
Fists full of stacks
Table of Contents
After reworking topSort, the only function left that still used
Data.Graph was scc
. My first plan was to build on the depth-first
style searches I had written with time counters and implement the
standard double-dfs CLSR algorithm. Based on this comparison, I
instead opted to try for Gabow's path based approach.
First I'll describe how the algorithm works and show an example scheme implementation. Then I'll discuss the haskell implementation for alga which augments that algorithm slightly to describe edge classification and build an explicit condensation graph.
Gabow's SCC Algorithm
The goal is to take a graph \(G\) and compute its condensation. The condensation of a graph is another graph whose vertices are the mutually reachable vertices of \(G\), and whose edges are inherited from \(G\) (except would-be self loops, which are not). Gabow's path-based SCC algorithm does this by doing a preorder traversal of the graph, maintaining two stacks of vertices. The boundary stack is used to find representative vertices of components. The path stack is used to keep track of which vertices belong to these components.
Like many graph algorithms, the path-based approach can be described by what happens when entering a vertex, following its out-edges, and exiting it. To fulfill alga's desired type, getting an explicit condensation, it's also necessary classify the edges in the process.
Description
The plan of attack is to keep track of which vertices have been visited, when they were, and which component they belong to. We discover the components by pushing vertices to stacks when visiting and inspecting stacks when leaving. So, the stateful ingredients are:
- enter time/preorder id counter
- component counter
- path stack
- boundary stack
- table mapping \(v \mapsto pre[v]\)
- table mapping \(v \mapsto scc[v]\)
- table of edges, which says if \(u\rightarrow v\) is an edge between components in the condensation, or an edge between vertices inside a component (inside a vertex of condensation).
The core idea is that (back) edges that point from vertices with higher preorder times to lower preorder times induce cycles and cycles induce components. The boundary stack is popped until the vertex not yet assigned to a component with the lowest reachable preorder id is on top. Representative vertices are discovered by looking at the boundary stack at exit. If the vertex being exited is found to also be at the top of the boundary stack, it represents the strongly connected component containing all the path stack up to itself. This process partitions the vertices according to strongly connected components.
Here is the state data type for the haskell implementation. The scheme version just mutates local variables.
Enter
When a vertex \(u\) is entered, the entry time is recorded, it added to the path stack and to the boundary stack. A piece of information this gives us is: vertices have been visited if and only if they have been assigned a pre-order id. See enter vertex.
Out Edges
For each out edge \(u \rightarrow v\), the state of \(v\) guides how to act:
- \(v\) hasn't been visited (has no pre-order id). Enter \(v\) – this is a depth first traversal, after all!
- \(v\) has been entered and has been assigned to a component. \(u\) can't belong to that component, so the edge \(u \rightarrow v\) is between vertices in the condensation. The stacks aren't modified.
- \(v\) has a pre-order id, but no component id. It must be the case that \(pre[v] < pre[u]\) and that \(u,v\) belong to the same component. The boundary stack is popped until the top vertex \(b\) has \(pre[b]\le pre[v]\). The effect of this last step is clearer after explaining exit.
Exit
After exploring all branches out of \(u\), the stacks indicate if \(u\) represents a component or not. If \(u\) is at the top of the boundary stack, none of its descendents pointed to a vertex with an earlier pre-order id. So, \(u\) represents some component containing all the vertices on the paths stack up to and including itself. Those vertices are removed from the path stack and assigned to a component accordining to the component counter. See exit vertex.
Example
Scheme Implementation
Here is a scheme implementation I wrote while working out the haskell one. It doesn't explicitly build a condensation, only a table mapping vertices to component ids. I find it easier to follow.
(define scc (lambda (G) (let ((preorder 0) (component 0) (preorders (make-hash-table)) (components (make-hash-table)) (boundary '()) (path '())) (define (preorder-id v) (hashtable-ref preorders v #f)) (define (component-id v) (hashtable-ref components v #f)) (define (set-component! v) (hashtable-set! components v component)) (define (set-preorder! v) (hashtable-set! preorders v preorder) (inc! preorder)) (define (enter v) (set-preorder! v) (push! v boundary) (push! v path)) (define (pop-boundary v) (let ((p-v (preorder-id v))) (let walk ((x (pop! boundary))) (if (< p-v (preorder-id x)) (walk (pop! boundary)) (push! x boundary))))) (define (exit v) (when (= v (car boundary)) (pop! boundary) (set-component! v) (let walk ((x (pop! path))) (unless (= x v) (set-component! x) (walk (pop! path)))) (inc! component))) (define (dfs u) (enter u) (for-all (lambda (v) (if (preorder-id v) (unless (component-id v) (pop-boundary v)) (dfs v))) (adjacent u G)) (exit u)) (for-all (lambda (v) (unless (preorder-id v) (dfs v))) (vertex-list G)) components)))
Haskell Implementation
scc :: Ord a => AdjacencyMap a -> AcyclicAdjacencyMap (NonEmptyAdjacencyMap a) scc = ...
This type is aspirational. While the outer part of the condensation graph is indeed acyclic, for now version 0.5 of alga returns a plain adjacency map. This will likely adapt in the future. Here's the actual type
scc :: Ord a => AdjacencyMap a -> AdjacencyMap (NonEmptyAdjacencyMap a) scc = ...
Edge classification
In the same way that the verticies are assigned to componnents, so too can edges be classified based on vertex pre-order times. The search state is augmented by including a stack of edges analogous to the path stack. On finding a vertex \(u\) which represents some SCC, edges \(v \rightarrow w\) in this edge stack are in the newly found component/vertex of the condensation exactly when minimum preorder id is greater or equal to that of \(u\), ie \[\min(pre[v],pre[w])\ge pre[u].\] The rest belong to another component. Here's the description for following out edges from the other page adapted to the edge classification point of view.
Following Out Edges
For each out edge \(u \rightarrow v\), the state of \(v\) guides how to classify:
- \(v\) hasn't been visited (has no pre-order id). dfs from \(v\). Edge \(u \rightarrow v\) is an inner edge if and only if \(v\) isn't a representative vertex. In other words, \(v\) will be at the top of the boundary stack when it's exited exactly when none of its descendents pointed to an ancestor. If a new component was found exiting v, \(u \rightarrow v\) is an edge between vertices of the condensation. Otherwise, push it on the edge stack.
- \(v\) has been entered and has been assigned to a component. \(u\) can't belong to that component, so the edge \(u \rightarrow v\) is between vertices in the condensation.
- \(v\) has a pre-order id, but no component id. \(u \rightarrow v\) is an edge inside a vertex of the condensation graph.
Condensation
Before getting to the nitty gritty the final step is the
condensing. As components or vertices of the condensation are
discovered (see the horrible insertComponent function below), they
are constructed and put into (yet) a(nother!) stack,
\(innerGraphs\). The first draft of the implmentation used an IntMap
,
but this trick avoids more logs (err, yeah IntMap lookup is still
pretty much log, no matter what they say)
The rest is ceremony to "coerce" the components to type NonEmpty
and
overlay the vertices and edges of the condensation.
condense :: Ord a => AdjacencyMap a -> StateSCC a -> AdjacencyMap (NonEmpty.AdjacencyMap a) condense g (SCC _ n _ _ _ assignment inner _ outer) | n == 1 = vertex $ convert g | otherwise = gmap (\c -> inner' Array.! (n-1-c)) outer' where inner' = Array.listArray (0,n-1) (convert <$> inner) outer' = es `overlay` vs vs = vertices [0..n-1] es = edges [ (sccid x, sccid y) | (x,y) <- outer ] sccid v = assignment Map.! v convert = fromJust . NonEmpty.toNonEmpty
Haskell Data Type
Here is the outer call and the hairy state data type. Since haskell map lookups cost a log, we store vertices along with their pre order id in the boundary stack to avoid that.
scc :: Ord a => AdjacencyMap a -> AdjacencyMap (NonEmpty.AdjacencyMap a) scc g = condense g $ execState (gabowSCC g) initialState where initialState = SCC 0 0 [] [] Map.empty Map.empty [] [] [] data StateSCC a = SCC { preorder :: {-# unpack #-} !Int , component :: {-# unpack #-} !Int , boundaryStack :: [(Int,a)] , pathStack :: [a] , preorders :: Map.Map a Int , components :: Map.Map a Int , innerGraphs :: [AdjacencyMap a] , innerEdges :: [(Int,(a,a))] , outerEdges :: [(a,a)] }
Nitty Gritty
Simplified haskell implementation
That's all the set up. Now to describe the it in the flesh. This version is simplified in that it ignores edge classification.
gabowSCC :: Ord a => AdjacencyMap a -> State (StateSCC a) () gabowSCC g = do let dfs u = do -- give u preorder id and push to stacks enter u forM_ (adjacent u) $ \v -> do -- for v in out edges preorderId v >>= \case -- if v not visited dfs v Nothing -> dfs v Just p_v -> do -- check if v has not been assigned to scc scc_v <- hasComponent v -- if not, contract stacks, putting vertex -- with lowest possible pre-id on top unless scc_v $ popBoundary p_v -- maybe discover a new scc, maybe not. exit u -- usual dfs of graph forM_ (vertexList g) $ \v -> do assigned <- hasPreorderId v -- vertex has been traversed iff it has (no) PreorderId unless assigned $ dfs v
Now for the helper functions, the meat outside the dfs. Note that the edge classification bits are included in these.
- Enter Vertex
Assign pre order, push to stacks. Since lookups for maps in haskell costs a \(\log\), we store vertices along with their pre order id in the boundary stack to avoid doing that.
enter v = do C pre scc bnd pth pres sccs gs es_i es_o <- get let pre' = pre+1 -- new enter time counter bnd' = (pre,v):bnd -- push to boundary stack pth' = v:pth -- push to path stack pres' = Map.insert v pre pres -- update preorder/time table put $ C pre' scc bnd' pth' pres' sccs gs es_i es_o return pre
- Contract Boundary Stack
\(popBoundary\) is be called when considering edges \(u \rightarrow v\) if \(v\) hasn't been assigned to a component and \(pre[v] < pre[u]\). Vertices \(w\) with preorder \(pre[w] > pre[v]\) are discarded and \(v\) may or may not come to represent the enclosing component.
popBoundary p_v = modify' (\(C pre scc bnd pth pres sccs gs es_i es_o) -> C pre scc (dropWhile ((>p_v).fst) bnd) pth pres sccs gs es_i es_o)
- Exit Vertex
When a new compoent is found, this updates inner edges, outer edges, vertices of condensation graph, path stacks, scc id counter.
exit v = do newComponent <- (v==).snd.head <$> gets boundary when newComponent $ insertComponent v return newComponent insertComponent v = modify' (\(SCC pre -- preorder id/entry time counter scc -- scc id counter bnd -- boundary stack pth -- path stack pres -- preorder table sccs -- scc id table gs -- vertices of condensation, indexed by scc id es_i -- inner edge path stack, popped section to be put in gs table es_o -- outer edges. to be used after traversal to condense graph ) -> let -- these vertices form the newly completed SCC (curr,v_pth') = span (/=v) pth -- pop vertices in completed component up to and including v pth' = tail v_pth' -- Here we know that v_pth' starts with v -- contract inide edge stack based on preorder id (es,es_i') = span ((>=p_v).fst) es_i -- new subgraph/condensation vertex g_i | null es = vertex v | otherwise = edges (snd <$> es) -- lowest time/preorder id in new scc is top of stack p_v = fst $ head bnd scc' = scc + 1 -- new SCC id bnd' = tail bnd -- pop boundary stack -- give vertices up to v in path stack a new SCC id sccs' = List.foldl' (\sccs x -> Map.insert x scc sccs) sccs (v:curr) gs' = g_i:gs -- insert subgraph in condensation table in SCC pre scc' bnd' pth' pres sccs' gs' es_i' es_o)
Gabow with edge classification
The full uncensored version, including edge classification:
gabowSCC :: AdjacencyIntMap -> State StateSCC () gabowSCC g = do let dfs u = do -- grab current time for help with edge classification p_u <- enter u forM_ (adjacent u) $ \v -> -- for out edges (u,v): preorderId v >>= \case Nothing -> do -- v not visited updated <- dfs v if updated -- if v formed new SCC then outedge (u,v) -- then: (u,v) is outside edge else inedge (p_u,(u,v)) -- else: (u,v) is inside edge Just p_v -> do -- v visited: scc_v <- hasComponent v if scc_v -- if v already assigned component then outedge (u,v) -- then: (u,v) outside edge -- else: update boundary and put edge -- (u,v) in edge path stack analogue where -- some inside edges go in next found scc, -- but possibly not all else popBoundary p_v >> inedge (p_v,(u,v)) exit u forM_ (vertexList g) $ \v -> do assigned <- hasPreorderId v unless assigned $ void $ dfs v