patternModerate
Graph generator for undirected graphs
Viewed 0 times
graphgraphsgeneratorundirectedfor
Problem
I am still pretty new to Haskell and am working on a graph generator for undirected unlabeled graphs containing loops and I have a bottleneck in the following functions. So far, I have not given any thought to performance and just looked for correctness.
I know this kind of question is not really popular, but I'd be interested in general guidelines for improving performance given some naive but correct implementation like the following.
For instance:
Additional info:
I don't think you need to understand all the details. I'm looking more for micro-improvements (the following determines arcs from a given node to equivalence classes of vertices grouped by their degree).
```
import Data.List
import Control.Monad
type Arcs = Int
type Count = Int
type Vertex = Int
type MSequence = [Int]
data EquivClass = EC { order :: Int, verts :: [Vertex] } deriving (Eq, Show)
type ECPartition = [EquivClass]
type NodeSelection = [(Arcs,Count)]
type ECNodeSelection = (EquivClass, NodeSelection)
-- number of occurences of unique elements in a list
occurences :: (Ord a) => [a] -> [(a, Int)]
occurences = map (\xs@(x:_) -> (x, length xs)) . group . reverse . sort
-- number of vertices in an equivalence class
ecLength :: EquivClass -> Int
ecLength = length . verts
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors
-- x = (x_1,...,x_n) such that |x| = m and x_i a -> [a] -> [[a]]
boundSequences m x | m sum x == m)
ranges = map (\x -> [0..x])
-- return m-sequences (combinations of number of arcs) for the given
-- order
I know this kind of question is not really popular, but I'd be interested in general guidelines for improving performance given some naive but correct implementation like the following.
For instance:
- Where to enforce strictness?
- Should I use vector/array instead of lists? When, where and why?
- Should I improve single functions by replacing recursion using folds/maps/etc. or similar?
Additional info:
- I'm using -O2 with
ghc
- profiling shows (when normalizing runtime of
connectionsto 100%)
arqSeq40% (with almost all time spend inboundsequences)
connectionCombinations60% with a quarter of the time spend inoccurences
I don't think you need to understand all the details. I'm looking more for micro-improvements (the following determines arcs from a given node to equivalence classes of vertices grouped by their degree).
```
import Data.List
import Control.Monad
type Arcs = Int
type Count = Int
type Vertex = Int
type MSequence = [Int]
data EquivClass = EC { order :: Int, verts :: [Vertex] } deriving (Eq, Show)
type ECPartition = [EquivClass]
type NodeSelection = [(Arcs,Count)]
type ECNodeSelection = (EquivClass, NodeSelection)
-- number of occurences of unique elements in a list
occurences :: (Ord a) => [a] -> [(a, Int)]
occurences = map (\xs@(x:_) -> (x, length xs)) . group . reverse . sort
-- number of vertices in an equivalence class
ecLength :: EquivClass -> Int
ecLength = length . verts
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors
-- x = (x_1,...,x_n) such that |x| = m and x_i a -> [a] -> [[a]]
boundSequences m x | m sum x == m)
ranges = map (\x -> [0..x])
-- return m-sequences (combinations of number of arcs) for the given
-- order
Solution
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors
-- x = (x_1,...,x_n) such that |x| = m and x_i a -> [a] -> [[a]]
boundSequences m x | m sum x == m)
ranges = map (\x -> [0..x])Excepting some fortunate cases, you are wasting a lot of work here.
sequence . ranges produces (y_1+1) ... (y_n+1) lists you have to check. If you write a function to produce only the successful lists, you can gain a lot of performance for larger input (it won't do too much for short lists with small elements, but it will help for those too).Shouldn't the case
sum x < m return [] rather than [[]]?-- return all the possible arc combinations to an equivalence
-- class for a given number of arcs
connectionCombinations :: Int -> EquivClass -> [NodeSelection]
connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep
where orderRep (EC o v) = replicate (length v) [0..o]
prune a = nub . map (reverse . sort) . filter ((== a) . sum)
groupOcc = filter ((/= 0) . fst) . occurencesInstead of
reverse . sort, use sortBy (flip compare). That won't make much difference for short lists, but it's cleaner, IMO. prune arcs . sequence contains another occurrence of the boundSequences problem, creating a lot of lists and filtering out most of them immediately. nub is not good, it is quadratic (worst case, O(total*distinct) in general). The things you're nubbing have an Ord instance, if the order doesn't matter, map head . group . sort is a much faster nub than nub (still better Data.Set.toList . Data.Set.fromList), if order matters, keep a Set of seen elements and for each new, output that and add it to the set of seen.I've not yet profiled, but it may be that the profile is misleading because it attributes costs to the functions where results are forced rather than where they are calculated, that needs some investigation, check back later.
Okay, the given data doesn't run long enough to produce a meaningful profile, so I changed it to
let cons = connections [EC 4 [1 .. 6], EC 6 [2 .. 9]] 5That runs long enough to collect a handful of samples. Unfortunately, the profile wasn't very informative:
total time = 7.90 secs (395 ticks @ 20 ms)
total alloc = 8,098,386,128 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
connectionCombinations Main 100.0 100.0So I inserted a couple of
{-# SCC #-} pragmas. The biggo in that programme is the sequence . orderRep, unsurprisingly. A rewrite along the lines mentioned above,boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]]
boundSequences m xs
| sm < m = []
| sm == m = [xs]
| otherwise = go sm m xs
where
sm = sum xs
go _ r []
| r == 0 = [[]]
| otherwise = []
go _ r [y]
| y < r = []
| otherwise = [[r]]
go s r (y:ys) = do
let mny | s < r+y = r+y-s
| otherwise = 0
mxy = min y r
c <- [mny .. mxy]
map (c:) (go (s-y) (r-c) ys)and using that in
connectionCombinations instead of prune arcs . sequence (requires a small change in orderRep),connectionCombinations :: Int -> EquivClass -> [NodeSelection]
connectionCombinations arcs = map groupOcc . nub . map (sortBy (flip compare)) .
boundSequences arcs . orderRep
where orderRep (EC o v) = replicate (length v) o
groupOcc = filter ((/= 0) . fst) . occurencesbrought the running time (ghc -O2, no profiling) down significantly:
dafis@schwartz:~/Cairo> time ./orArcs > /dev/null
real 0m5.836s
user 0m5.593s
sys 0m0.231s
dafis@schwartz:~/Cairo> time ./arcs > /dev/null
real 0m0.008s
user 0m0.005s
sys 0m0.003sWhether any further optimisations (e.g. the abovementioned
nub) are necessary, and where most of the time in the actual programme is spent, so that one could identify the most important points, I cannot tell without more realistic data.Code Snippets
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors
-- x = (x_1,...,x_n) such that |x| = m and x_i <= y_i
boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]]
boundSequences m x | m <= sum x = (fByM . sequence . ranges) x
| otherwise = [[]]
where fByM = filter (\x -> sum x == m)
ranges = map (\x -> [0..x])-- return all the possible arc combinations to an equivalence
-- class for a given number of arcs
connectionCombinations :: Int -> EquivClass -> [NodeSelection]
connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep
where orderRep (EC o v) = replicate (length v) [0..o]
prune a = nub . map (reverse . sort) . filter ((== a) . sum)
groupOcc = filter ((/= 0) . fst) . occurenceslet cons = connections [EC 4 [1 .. 6], EC 6 [2 .. 9]] 5total time = 7.90 secs (395 ticks @ 20 ms)
total alloc = 8,098,386,128 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
connectionCombinations Main 100.0 100.0boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]]
boundSequences m xs
| sm < m = []
| sm == m = [xs]
| otherwise = go sm m xs
where
sm = sum xs
go _ r []
| r == 0 = [[]]
| otherwise = []
go _ r [y]
| y < r = []
| otherwise = [[r]]
go s r (y:ys) = do
let mny | s < r+y = r+y-s
| otherwise = 0
mxy = min y r
c <- [mny .. mxy]
map (c:) (go (s-y) (r-c) ys)Context
StackExchange Code Review Q#6488, answer score: 11
Revisions (0)
No revisions yet.