HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

F# program to generate a Map of substrings frequencies

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
frequenciesmapsubstringsprogramgenerate

Problem

I wrote the following F# program that takes an input txt file and create a Map that contains all possible substrings of size N and their frequency in the file.

In other words if I call the program with a text file with the following content:


James John Max Gary Jess Gilles Mary

With a length of 2, the probabilityTable contains a value with a key of "ar" with a value of 0.054054.

Here is the program:

```
module NameGenerator

open System
open System.Collections.Generic

// Open a file and read all lines into IEnumerable
let readInputFile (filePath:string) =
System.IO.File.ReadAllText(filePath)

// Parses a string and count the total number of occurrences of substrings of size length
let rec countOccurrences (input:string) (occurrenceTable:Map) length =
let adjLen = length - 1

match input |> Seq.toList with
| head :: tail when tail.Length >= adjLen ->
let other = Seq.take adjLen tail |> Seq.toList
let occurrence = (head :: other |> Array.ofList |> String)

// add current occurrence to the occurrence table
let updatedMap = match occurrenceTable.ContainsKey (occurrence) with
| true -> occurrenceTable.Add(occurrence, occurrenceTable.[occurrence] + 1.0)
| false -> occurrenceTable.Add(occurrence, 1.0)

// call the function recursively with the rest of the string
countOccurrences (tail |> Array.ofList |> String) updatedMap length
| _ -> occurrenceTable

// Given a map that contains the a collection of string and their respective counts, return a
// frequency map that is obtained by count / total count
let buildFrenquencyTable (occurrenceTable:Map) =
// fold occurence dict, count total
let total = Map.fold (fun acc key value -> acc + value) 0.0 occurrenceTable

// map over previous map and replace values with count / total count
Map.map (fun key value -> Math.Round(value / total, 6)) occurrenceTable

// Given an input file cr

Solution

Fortunately you're already following the functional idioms entirely. So that's a very good thing.

The only things I would comment on are a few minor details:

let occurrence = (head :: other |> Array.ofList |> String)


Those parenthesis are superfluous.

let updatedMap = match occurrenceTable.ContainsKey (occurrence) with
                 | true -> occurrenceTable.Add(occurrence, occurrenceTable.[occurrence] + 1.0)
                 | false -> occurrenceTable.Add(occurrence, 1.0)


In this case, you have a lot of whitespace to the left of the match conditions, usually in this case I just break the match to a new line and then work from there.

let updatedMap =
    match occurrenceTable.ContainsKey (occurrence) with
    | true -> occurrenceTable.Add(occurrence, occurrenceTable.[occurrence] + 1.0)
    | false -> occurrenceTable.Add(occurrence, 1.0)


It's generally easier to read.

let total = Map.fold (fun acc key value -> acc + value) 0.0 occurrenceTable


I absolutely love to see fold used! I think it's far overlooked and definitely has many use cases.

In regards to this method:

let rec countOccurrences (input:string) (occurrenceTable:Map) length =


You nailed tail-call recursion. I would be willing to bet that if you looked at the IL for this code you would see a beautiful tail. call in there. Excellent work!

Overall, beautiful work. I think you did a phenomenal job with this code. :)

Code Snippets

let occurrence = (head :: other |> Array.ofList |> String)
let updatedMap = match occurrenceTable.ContainsKey (occurrence) with
                 | true -> occurrenceTable.Add(occurrence, occurrenceTable.[occurrence] + 1.0)
                 | false -> occurrenceTable.Add(occurrence, 1.0)
let updatedMap =
    match occurrenceTable.ContainsKey (occurrence) with
    | true -> occurrenceTable.Add(occurrence, occurrenceTable.[occurrence] + 1.0)
    | false -> occurrenceTable.Add(occurrence, 1.0)
let total = Map.fold (fun acc key value -> acc + value) 0.0 occurrenceTable
let rec countOccurrences (input:string) (occurrenceTable:Map<string, float>) length =

Context

StackExchange Code Review Q#151286, answer score: 2

Revisions (0)

No revisions yet.