Top Sort Alga
The next stop in my contribution to alga was implementing topSort
sans Data.Graph
.
Topologically sorting a graph is giving an order to its vertices such that if \(x, \dots, y\) appears in the resulting enumeration, there is no edge \(y \rightarrow x\) in the (transitive closure of the) graph.
Data.Graph.topSort
is debatably too tolerant as it will return the
exit order of the depth first traversal, without rejecting cyclic
graphs. Previously, alga's implementation would run
Data.Graph.topSort
and then verify the result is a valid topological
ordering by using an algorithm developed in the paper Depth-First
Search and Strong Connectivity in Coq.
Thus, the original type in alga's topSort
was topSort :: Ord a =>
AdjacencyMap a -> Maybe [a]
. In pursuit of progress but also for fun,
for the new implementation of topSort
we decided why not present an
explicit cycle if one is found? Enter the new type: topSort :: Ord a
=> AdjacencyMap a -> Either (Cycle a) [a]
where
type Cycle = NonEmpty
Cycles
Representing cycles with nonempty lists (where the last element points
to the head) uses a classic nicety of haskell types where nonsensical
states aren't representable: what is an empty cycle, anyway? The
initial impetus for realizing this type came in writing the retrace
inner function to build up the discovered cycle, where the compiler
complained about inexhaustive patterns since there was no point in
matchcing against the empty list!
topSort' g = ... where ... retrace curr head parent = aux (curr :| []) where aux xs@(curr :| _) | head == curr = xs | otherwise = aux (parent Map.! curr <| xs)
ABC
Before going into the details of the implementation, there was a small issue that had been raised about which topological sort was returned. In general there are many valid topological orderings, so long as the first property I mentioned holds. Data.Graph's topsort returns
> topSort $ 3 * (1 * 4 + 2 * 5) Just [3,2,1,4,5]
but the lexicacographically least possible enumeration is
[3,1,2,4,5]
. The "fix" was pretty simple, since vertices are added
to the stack at exit, the way to get the alphabetically first ordering
is to procrastinate checking the small letters. That's why in the code
below vertices
and adjacent
are defined
... do let vertices = map fst $ Map.toDescList $ adjacencyMap g adjacent = Set.toDescList . flip postSet g ...
Nitty Gritty
There are a few additions to the search state that are needed relative
to dfs
and bfs
in order to achieve these goals. First the state of
vertices needs more information than true | false
whether a vertex
is explored or not; to detect cycles we also need to know if its tree
is being explored or has been explored. Second, In order to present a
cycle, we also want to add a table in which the parent pointers are
stored for retracing. Last, if the graph is cyclic we want to exit
early – callCC
! So:
data NodeState = Entered | Exited data S a = S { parent :: Map.Map a a , entry :: Map.Map a NodeState , order :: [a] } topSort' :: (Ord a, MonadState (S a) m, MonadCont m) => AdjacencyMap a -> m (Either (Cycle a) [a]) topSort' g = callCC $ \cyclic -> do ...
where S
is an internal data type that lies inside the state while
searching. The last bit of bureaucracy is classifying what state in
searching implies a cycle. The structure of the traversal is: for each
vertex \(v\) dfsRoot v
. dfsRoot
wraps dfs
since roots of trees
have no parent and we're interested in parents. In traversing with
dfs
, the state of child \(y\) determines if it is in a cycle with
vertex \(x\):
- if \(y\) has been
Exited
edge \(x \rightarrow y\) is a forward edge, and there's no cycle containing \(x,y\). - if \(y\) has not been traversed, it may or may not be in a cycle with \(x\), we traverse its tree to find out.
- if \(y\)'s tree is in the process of being traversed (as is \(x\)'s), then we know edge \(x\rightarrow y\) completes a cycle that we can then retrace!
topSort' :: (Ord a, MonadState (S a) m, MonadCont m) => AdjacencyMap a -> m (Either (Cycle a) [a]) topSort' g = callCC $ \cyclic -> do let vertices = map fst $ Map.toDescList $ adjacencyMap g adjacent = Set.toDescList . flip postSet g dfsRoot x = nodeState x >>= \case Nothing -> enterRoot x >> dfs x >> exit x _ -> return () dfs x = forM_ (adjacent x) $ \y -> nodeState y >>= \case Nothing -> enter x y >> dfs y >> exit y Just Exited -> return () Just Entered -> cyclic . Left . retrace x y =<< gets parent forM_ vertices dfsRoot Right <$> gets order
For completeness, the internal helper functions. When entering a
vertex \(v\) through \(u \rightarrow v\), record this. When exiting a
vertex \(v\), push it into the topological ordering we're building
up. Whereas cycles had a nice and easy representation in haskell
types, I don't believe it's easy to express that exit
never leaves
the same vertex twice (fancier types needed?)… If you know how,
please share! For time being, the wart with error
stays…
nodeState v = gets (Map.lookup v . entry) enter u v = modify' (\(S m n vs) -> S (Map.insert v u m) (Map.insert v Entered n) vs) enterRoot v = modify' (\(S m n vs) -> S m (Map.insert v Entered n) vs) exit v = modify' (\(S m n vs) -> S m (Map.alter (fmap leave) v n) (v:vs)) where leave = \case Entered -> Exited Exited -> error "Internal error: dfs search order violated" retrace curr head parent = aux (curr :| []) where aux xs@(curr :| _) | head == curr = xs | otherwise = aux (parent Map.! curr <| xs)
Everything so far has been inner/private/local to the module. At last, here is the exposed wrapper:
topSort :: Ord a => AdjacencyMap a -> Either (Cycle a) [a] topSort g = runContT (evalStateT (topSort' g) initialState) id where initialState = S Map.empty Map.empty []