Simple command-line password generator

Problem

I'm very new to Haskell, I've read most of learnyouahaskell.com and played around with some simple things, but this is probably the most 'complicated' bit of Haskell I've written so far. I have an implementation in PHP that does basically the same thing, but it is much much faster.. I'm guessing my bottleneck here is that randomRIO takes a long time to return a value, is there any way to increase the speed, or get an array of values instead of grabbing them one at a time?

That aside, any other tips and/or suggestions about improving my implementation would be very helpful!

module Main where

import System.Environment
import System.Exit
import System.Random (randomRIO)
import Control.Monad (replicateM)
import qualified Data.Map as M

import Text.Regex.Posix

main :: IO ()
main = do
    args /?"

randomReplace :: String -> String -> IO String
randomReplace [] subject = return subject
randomReplace (replacement:rs) subject = do
    randomIndex  IO String
generatePassword withSymbols = do
    let passwordLength = 500 -- obviously you wouldn't use 500 as a default here, but I'm just benchmarking
    numDigits >= (\x -> return $ symbols !! x))
            randomReplace symbolsToReplace p3
        else return p3
    if (password =~ "[a-z]" :: Bool) && (password =~ "[A-Z]" :: Bool) && (password =~ "[0-9]" :: Bool) -- we knows symbols are in there since it went last
        then return password
        else generatePassword withSymbols

Solution

Too much IO.

Text.Regex.Posix is too much for those simple tests. Why not any isUpper, any isLower, any isDigit?

Your program ignores --help and --with-symbols (have a look at OptParse-Applicative)

randomReplace has too many temporary structures (see below)

You use length too often. Unlike in PHP, it's quite slow (O(n) vs O(1)).

A short search for bottlenecks

I'm guessing my bottleneck here ...

Don't guess, measure. Enable profiling to check where you're actually losing time:

$ stack install --profile random
$ stack install --profile regex-posix
$ stack exec -- ghc -O2 -prof -auto-all PWGen.hs
$ ./PWGen+RTS -s -p

If you don't use stack, make sure to enable profiling when you install the packages:

$ cabal sandbox init
$ cabal install -p random regex-posix
$ cabal exec -- ghc -O2 -prof -auto-all PWGen.hs
$ ./PWGen +RTS -s -p

This is actually fast, but not fast enough:

9,420,841,024 bytes allocated in the heap
6,425,011,808 bytes copied during GC
1,021,088 bytes maximum residency (2915 sample(s))
59,672 bytes maximum slop
4 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed) Avg pause Max pause
Gen 0 15199 colls, 0 par 1.844s 1.971s 0.0001s 0.0011s
Gen 1 2915 colls, 0 par 1.188s 1.076s 0.0004s 0.0009s

INIT time 0.000s ( 0.001s elapsed)
MUT time 1.953s ( 2.001s elapsed)
GC time 3.031s ( 3.048s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 5.000s ( 5.050s elapsed)

%GC time 60.6% (60.4% elapsed)

Alloc rate 4,823,470,604 bytes per MUT second

Productivity 39.4% of total user, 39.0% of total elapsed

Note: I've changed passwordLength to 5000, since it was too quick with 500. However, the times above don't give enough information where you actually lose that time. That's what -p was for. PWGen.prof contains the following data:

COST CENTRE MODULE no. entries %time %alloc %time %alloc

MAIN MAIN 59 0 0.0 0.0 100.0 100.0
main Main 119 0 0.8 0.0 99.9 100.0
generatePassword Main 121 1 0.4 0.1 99.1 100.0
randomReplace Main 124 5730 69.5 62.9 98.7 99.9
randomReplace.hash Main 125 5728 29.2 36.9 29.2 36.9
main.helpOnly Main 120 1 0.0 0.0 0.0 0.0

Almost all memory is allocated in randomReplace. After all, you break both the map and the list apart for every character in replacement. That's quite costly. Instead, let's try to stay on a single map as long as possible:

-- This is still not an idiomatic version, but better
randomReplace :: String -> String -> IO String
randomReplace rs subject = fmap (map snd . M.toList) $ go rs $ M.fromList $ zip [0..] subject
  where
    l = length subject
    go []     m = return m
    go (r:rs) m = do
      randomIndex <- randomRIO (0, l - 1) :: IO Int
      go rs $ M.insert randomIndex r m

What's the big difference? Well, we're not using length repeatedly, which is a big plus. length needs to traverse the whole list to get its result. Also, we're not switching between lists and maps all the time, which gets rid of all intermediate lists.

Running the benchmark again, we gain the following result:

INIT time 0.000s ( 0.001s elapsed)
MUT time 0.016s ( 0.021s elapsed)
GC time 0.016s ( 0.011s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.031s ( 0.033s elapsed)

%GC time 50.0% (32.6% elapsed)

Alloc rate 2,105,396,736 bytes per MUT second

Productivity 50.0% of total user, 47.7% of total elapsed

individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc

MAIN MAIN 59 0 5.6 0.0 100.0 100.0
main Main 119 0 5.6 0.1 94.4 97.8
generatePassword Main 121 1 55.6 44.7 88.9 97.7
randomReplace Main 124 2 0.0 16.4 33.3 53.0
randomReplace.l Main 126

Code Snippets

$ stack install --profile random
$ stack install --profile regex-posix
$ stack exec -- ghc -O2 -prof -auto-all PWGen.hs
$ ./PWGen+RTS -s -p

$ cabal sandbox init
$ cabal install -p random regex-posix
$ cabal exec -- ghc -O2 -prof -auto-all PWGen.hs
$ ./PWGen +RTS -s -p

-- This is still not an idiomatic version, but better
randomReplace :: String -> String -> IO String
randomReplace rs subject = fmap (map snd . M.toList) $ go rs $ M.fromList $ zip [0..] subject
  where
    l = length subject
    go []     m = return m
    go (r:rs) m = do
      randomIndex <- randomRIO (0, l - 1) :: IO Int
      go rs $ M.insert randomIndex r m

k                    -- for every character in rs
  * ( n              -- get the length of subject (every time!)
      + n * log(n)   -- create the map
      + log (n)      -- insert a character at a random position
      + 2 * n        -- zip the list and zip it back
    )

k * (log (n))        -- for every character insert a character into a map
+ n                  -- get the length once(!)
+ n * log (n)        -- create a map once(!)
+ 2 * n              -- zip the map and zip it back

Context

StackExchange Code Review Q#117030, answer score: 4

Revisions (0)

No revisions yet.