Top Sort Alga

The next stop in my contribution to alga was implementing topSort sans Data.Graph.

Topologically sorting a graph is giving an order to its vertices such that if \(x, \dots, y\) appears in the resulting enumeration, there is no edge \(y \rightarrow x\) in the (transitive closure of the) graph.

Data.Graph.topSort is debatably too tolerant as it will return the exit order of the depth first traversal, without rejecting cyclic graphs. Previously, alga's implementation would run Data.Graph.topSort and then verify the result is a valid topological ordering by using an algorithm developed in the paper Depth-First Search and Strong Connectivity in Coq.

Thus, the original type in alga's topSort was topSort :: Ord a => AdjacencyMap a -> Maybe [a]. In pursuit of progress but also for fun, for the new implementation of topSort we decided why not present an explicit cycle if one is found? Enter the new type: topSort :: Ord a => AdjacencyMap a -> Either (Cycle a) [a] where

type Cycle = NonEmpty

Cycles

Representing cycles with nonempty lists (where the last element points to the head) uses a classic nicety of haskell types where nonsensical states aren't representable: what is an empty cycle, anyway? The initial impetus for realizing this type came in writing the retrace inner function to build up the discovered cycle, where the compiler complained about inexhaustive patterns since there was no point in matchcing against the empty list!

topSort' g = ... where
    ...
    retrace curr head parent = aux (curr :| []) where
      aux xs@(curr :| _)
        | head == curr = xs
        | otherwise = aux (parent Map.! curr <| xs)

ABC

Before going into the details of the implementation, there was a small issue that had been raised about which topological sort was returned. In general there are many valid topological orderings, so long as the first property I mentioned holds. Data.Graph's topsort returns

> topSort $ 3 * (1 * 4 + 2 * 5)
Just [3,2,1,4,5]

but the lexicacographically least possible enumeration is [3,1,2,4,5]. The "fix" was pretty simple, since vertices are added to the stack at exit, the way to get the alphabetically first ordering is to procrastinate checking the small letters. That's why in the code below vertices and adjacent are defined

...
  do let vertices = map fst $ Map.toDescList $ adjacencyMap g
         adjacent = Set.toDescList . flip postSet g
     ...

Nitty Gritty

There are a few additions to the search state that are needed relative to dfs and bfs in order to achieve these goals. First the state of vertices needs more information than true | false whether a vertex is explored or not; to detect cycles we also need to know if its tree is being explored or has been explored. Second, In order to present a cycle, we also want to add a table in which the parent pointers are stored for retracing. Last, if the graph is cyclic we want to exit early – callCC! So:

data NodeState = Entered | Exited
data S a = S { parent :: Map.Map a a
             , entry  :: Map.Map a NodeState
             , order  :: [a] }

topSort' :: (Ord a, MonadState (S a) m, MonadCont m)
         => AdjacencyMap a -> m (Either (Cycle a) [a])
topSort' g = callCC $ \cyclic ->
  do ...

where S is an internal data type that lies inside the state while searching. The last bit of bureaucracy is classifying what state in searching implies a cycle. The structure of the traversal is: for each vertex \(v\) dfsRoot v. dfsRoot wraps dfs since roots of trees have no parent and we're interested in parents. In traversing with dfs, the state of child \(y\) determines if it is in a cycle with vertex \(x\):

  • if \(y\) has been Exited edge \(x \rightarrow y\) is a forward edge, and there's no cycle containing \(x,y\).
  • if \(y\) has not been traversed, it may or may not be in a cycle with \(x\), we traverse its tree to find out.
  • if \(y\)'s tree is in the process of being traversed (as is \(x\)'s), then we know edge \(x\rightarrow y\) completes a cycle that we can then retrace!
topSort' :: (Ord a, MonadState (S a) m, MonadCont m)
         => AdjacencyMap a -> m (Either (Cycle a) [a])
topSort' g = callCC $ \cyclic ->
  do let vertices = map fst $ Map.toDescList $ adjacencyMap g
         adjacent = Set.toDescList . flip postSet g
         dfsRoot x = nodeState x >>= \case
           Nothing -> enterRoot x >> dfs x >> exit x
           _       -> return ()
         dfs x = forM_ (adjacent x) $ \y ->
                   nodeState y >>= \case
                     Nothing      -> enter x y >> dfs y >> exit y
                     Just Exited  -> return ()
                     Just Entered -> cyclic . Left . retrace x y =<< gets parent
     forM_ vertices dfsRoot
     Right <$> gets order

For completeness, the internal helper functions. When entering a vertex \(v\) through \(u \rightarrow v\), record this. When exiting a vertex \(v\), push it into the topological ordering we're building up. Whereas cycles had a nice and easy representation in haskell types, I don't believe it's easy to express that exit never leaves the same vertex twice (fancier types needed?)… If you know how, please share! For time being, the wart with error stays…

    nodeState v = gets (Map.lookup v . entry)
    enter u v = modify' (\(S m n vs) -> S (Map.insert v u m)
                                          (Map.insert v Entered n)
                                          vs)
    enterRoot v = modify' (\(S m n vs) -> S m (Map.insert v Entered n) vs)
    exit v = modify' (\(S m n vs) -> S m (Map.alter (fmap leave) v n) (v:vs))
      where leave = \case
              Entered -> Exited
              Exited  -> error "Internal error: dfs search order violated"
    retrace curr head parent = aux (curr :| []) where
      aux xs@(curr :| _)
        | head == curr = xs
        | otherwise = aux (parent Map.! curr <| xs)

Everything so far has been inner/private/local to the module. At last, here is the exposed wrapper:

topSort :: Ord a => AdjacencyMap a -> Either (Cycle a) [a]
topSort g = runContT (evalStateT (topSort' g) initialState) id where
  initialState = S Map.empty Map.empty []

Created: 2020-05-31 Sun 17:18