patternMinor
Anagram finder in F#
Viewed 0 times
anagramfinderstackoverflow
Problem
I've been learning F# for a month or so and I'm wondering how good my "functional" aspect of coding is. When I first started, I did this using an iterative approach with a lot of <- and mutables etc. I tried to rewrite it as functional as possible. Any critique is appreciated.
This code prints out all anagrams in the english dictionary read from a "word.txt" file, the LONGEST anagram, and the anagram with the MOST permutations.
```
open System
let sortStringAsKey (listOfWords : string array) =
let getKey (str : string) =
str.ToCharArray()
|> Array.sort
|> String
listOfWords
|> Array.groupBy getKey
|> Array.filter (fun (sortedWord, originalWord) -> originalWord.Length > 1)
[]
let main argv =
let filename = "words.txt"
let listOfWords = System.IO.File.ReadAllLines(filename)
let listOfAnagrams = sortStringAsKey listOfWords
// Prints every single anagram combination
listOfAnagrams |> Array.iter (fun (_ , anagramList) -> anagramList |> Array.iter (fun str -> printfn "%s" str); printfn "")
// Gets the LONGEST anagram
let longestAnagrams = listOfAnagrams
|> Array.filter (fun (sortedWord, _) -> sortedWord.Length >= (listOfAnagrams
|> Array.maxBy (fun (sortedWord, _)-> sortedWord.Length)
|> fst
|> String.length))
// Gets the set that has the MOST anagrams
let mostAnagrams = listOfAnagrams
|> Array.filter (fun (_, originalWords) -> originalWords.Length >= (listOfAnagrams
|> Array.maxBy (fun (_, originalWords) -> originalWords.Length)
This code prints out all anagrams in the english dictionary read from a "word.txt" file, the LONGEST anagram, and the anagram with the MOST permutations.
```
open System
let sortStringAsKey (listOfWords : string array) =
let getKey (str : string) =
str.ToCharArray()
|> Array.sort
|> String
listOfWords
|> Array.groupBy getKey
|> Array.filter (fun (sortedWord, originalWord) -> originalWord.Length > 1)
[]
let main argv =
let filename = "words.txt"
let listOfWords = System.IO.File.ReadAllLines(filename)
let listOfAnagrams = sortStringAsKey listOfWords
// Prints every single anagram combination
listOfAnagrams |> Array.iter (fun (_ , anagramList) -> anagramList |> Array.iter (fun str -> printfn "%s" str); printfn "")
// Gets the LONGEST anagram
let longestAnagrams = listOfAnagrams
|> Array.filter (fun (sortedWord, _) -> sortedWord.Length >= (listOfAnagrams
|> Array.maxBy (fun (sortedWord, _)-> sortedWord.Length)
|> fst
|> String.length))
// Gets the set that has the MOST anagrams
let mostAnagrams = listOfAnagrams
|> Array.filter (fun (_, originalWords) -> originalWords.Length >= (listOfAnagrams
|> Array.maxBy (fun (_, originalWords) -> originalWords.Length)
Solution
This looks like a good start, so the following is by no means a criticism, but there are various refactorings you can perform to make the code smaller, and more generic.
As a general comment, I'd advise against naming arrays listOfWhatever, since
Finding the anagrams
The first
You'll notice that I inlined the
I've also changed from working explicitly with arrays, to using the
You could even perform an eta reduction on the function, in order to make it even shorter, but I'm not sure it becomes more readable by it:
Just for the fun, you can make it even more cryptic:
Personally, I don't even find that readable myself; I'd prefer the first, most verbose option.
Printing the anagrams
Since there's at least three places where the code prints out the anagrams, it's more reasonable to turn that into a function:
Once again, you'll notice that I chose to use
Loading anagrams
Loading the anagrams from file can be simplified a bit as well, since you don't need the intermediate
Because of the more generic version of
Printing all the anagrams
With the
Instead of performing work inside on of
Finding the longest anagrams
The proposed solution suffers from calling
You can make it more efficient by only doing it once:
This still requires two traversals of the array, so isn't as efficient as it could be, but is probably (but measure instead of assume, when it comes to performance) more efficient than doing
See below for an optional refactoring.
Finding the words with most anagrams
Likewise, you can find the largest collections of anagrams:
Notice how this is similar to
Printing
Both
Alternative max and filter operation
The problem with even the refactored version of
Once you realise that the operation is essentially a fold, you may want to optimise it to a single traversal:
As you can tell, it's more code (so more complicated), but at least in theory more efficient, as it only uses a single traversal. It does, however, potentially causes more memory allocations, so as always when performance is involved: measure.
On my machine, though, it seems to be more than twice as fast...
You can ref
As a general comment, I'd advise against naming arrays listOfWhatever, since
list is a concrete data type in F#, separate from arrays. I've renamed listOfWords to words, listOfAnagrams to anagrams, and so on.Finding the anagrams
The first
sortStringAsKey function doesn't need type annotations if you refactor it a bit:// seq -> seq> when 'a :> seq
let sortStringAsKey words =
words
|> Seq.groupBy (Seq.sort >> Seq.toArray >> String)
|> Seq.filter (fun (_, originalWords) -> Seq.length originalWords > 1)You'll notice that I inlined the
getKey function, but perhaps that's taking it too far. If you think that having a named local function makes the code more readable, I wouldn't disagree.I've also changed from working explicitly with arrays, to using the
Seq module. This makes the function more generic, but it can still handle arrays.You could even perform an eta reduction on the function, in order to make it even shorter, but I'm not sure it becomes more readable by it:
let sortStringAsKey =
Seq.groupBy (Seq.sort >> Seq.toArray >> String)
>> Seq.filter (fun (_, originalWords) -> Seq.length originalWords > 1)Just for the fun, you can make it even more cryptic:
let sortStringAsKey =
Seq.groupBy (Seq.sort >> Seq.toArray >> String)
>> Seq.filter (snd >> Seq.length >> ((<) 1))Personally, I don't even find that readable myself; I'd prefer the first, most verbose option.
Printing the anagrams
Since there's at least three places where the code prints out the anagrams, it's more reasonable to turn that into a function:
// seq -> unit
let prints a =
a |> Seq.iter (printfn "%s")
printfn ""Once again, you'll notice that I chose to use
Seq.iter instead of Array.iter. It'll still be able to handle arrays, but there's no reason to constrain the input if it isn't necessary.Loading anagrams
Loading the anagrams from file can be simplified a bit as well, since you don't need the intermediate
listOfWords value:// In main function:
let filename = "../../words.txt"
let anagrams = System.IO.File.ReadAllLines(filename) |> sortStringAsKeyBecause of the more generic version of
sortStringAsKey, the type of anagrams is seq>.Printing all the anagrams
With the
prints function, you can easily print all the anagramsanagrams |> Seq.map snd |> Seq.iter printsInstead of performing work inside on of
Seq.iter (which is possible), I often prefer to perform transformations etcetera first, because I can easily test such pure functions. Once you have data in the appropriate shape, you can always use Seq.iter to e.g. print it.Finding the longest anagrams
The proposed solution suffers from calling
Array.maxBy for every element, so it'd be quite inefficient.You can make it more efficient by only doing it once:
let longestAnagrams =
let longest = anagrams |> Seq.map (fst >> String.length) |> Seq.max
anagrams |> Seq.filter (fst >> String.length >> ((=) longest))This still requires two traversals of the array, so isn't as efficient as it could be, but is probably (but measure instead of assume, when it comes to performance) more efficient than doing
Array.maxBy for every entry.See below for an optional refactoring.
Finding the words with most anagrams
Likewise, you can find the largest collections of anagrams:
let mostAnagrams =
let most = anagrams |> Seq.map (snd >> Seq.length) |> Seq.max
anagrams |> Seq.filter (snd >> Seq.length >> ((=) most))Notice how this is similar to
longestAnagrams.Printing
Both
longestAnagrams and mostAnagrams can be printed using the prints function:longestAnagrams |> Seq.map snd |> Seq.iter prints
mostAnagrams |> Seq.map snd |> Seq.iter printsAlternative max and filter operation
The problem with even the refactored version of
longestAnagrams is that it requires two traversals of the sequence in order to compute the results.Once you realise that the operation is essentially a fold, you may want to optimise it to a single traversal:
let longestAnagrams =
let folder l (k, v) =
let xl = Seq.length k
match xl, l with
| _, [] -> [k, v]
| x, (hk, _)::_ when xl > Seq.length hk -> [k, v]
| x, (hk, hv)::t when xl = Seq.length hk -> (k, v)::(hk, hv)::t
| _ -> l
anagrams |> Seq.fold folder []As you can tell, it's more code (so more complicated), but at least in theory more efficient, as it only uses a single traversal. It does, however, potentially causes more memory allocations, so as always when performance is involved: measure.
On my machine, though, it seems to be more than twice as fast...
You can ref
Code Snippets
// seq<'a> -> seq<string * seq<'a>> when 'a :> seq<char>
let sortStringAsKey words =
words
|> Seq.groupBy (Seq.sort >> Seq.toArray >> String)
|> Seq.filter (fun (_, originalWords) -> Seq.length originalWords > 1)let sortStringAsKey =
Seq.groupBy (Seq.sort >> Seq.toArray >> String)
>> Seq.filter (fun (_, originalWords) -> Seq.length originalWords > 1)let sortStringAsKey =
Seq.groupBy (Seq.sort >> Seq.toArray >> String)
>> Seq.filter (snd >> Seq.length >> ((<) 1))// seq<string> -> unit
let prints a =
a |> Seq.iter (printfn "%s")
printfn ""// In main function:
let filename = "../../words.txt"
let anagrams = System.IO.File.ReadAllLines(filename) |> sortStringAsKeyContext
StackExchange Code Review Q#123351, answer score: 5
Revisions (0)
No revisions yet.