patternMinor
Processing a tab-delimited file with vertexes and ages
Viewed 0 times
filevertexesdelimitedwithtabagesandprocessing
Problem
The following block of code needs to be as fast as possible as it likely to be called many thousands of times in a run.
I am also conscious that my thinking style is leaning towards a more procedural style from time to time and hence I may not be taking full advantage of all the benefits that functional programming brings.
Any tips for
I am also conscious that my thinking style is leaning towards a more procedural style from time to time and hence I may not be taking full advantage of all the benefits that functional programming brings.
Any tips for
- making the code run faster, and
- making it more functional
let qarray (tableName : string) (startAge : int) (startYear : int) =
let tableData = File.ReadAllLines tableName
// find which row the vertex is in
let vrow = Array.findIndex (fun (s:string) -> s.StartsWith vertexLiteral) tableData
let firstYear = int(tableData.[vrow].Split('\t').[1])
// filter out all the row prior to the column headers e.g. table description and comments
let filteredArray = Array.sub tableData (vrow+1) (tableData.Length-vrow-1)
// use the vertex info to read all lines beyond that, converting to doubles
let f (s:string) = s.Split('\t') |> Array.map double
let fullArray = Array.map f filteredArray
[| for i in 0 .. (120 - startAge - 1) -> fullArray.[startAge + i - 1].[System.Math.Min(startYear - firstYear + i + 1, fullArray.[0].Length)] |]Solution
You should indent the body of the
You should also rename your function
If you're using .net 4.0, you can (and should) use
Note that since
As I said, you can no longer use an index based approach here. The idiomatic way to get rid of certain elements at the beginning of a sequence is to use
(Note that by using
Now we can use
Those lines are fine except that you now need to use
Ok, here 120 is a magic number, which you did not explain (which you should fix by documenting its meaning), so I'm not sure whether you know that the table will have exactly 120 elements and you have the number in there to avoid a bounds violation or whether the table can contain more than 120 elements and you only want to take the first 120. I'm going to assume the latter is the case.
Further it is not clear to me why you use
So what you're doing here is basically to skip the first
Since this is still a bit long, it might be worthwhile to factor out
qarray function to make it more obvious where it starts and where it's end (I mean I know it ends at the end of the file in this case, but that's not immediately apparent at first glance).You should also rename your function
f into something more descriptive.let tableData = File.ReadAllLines tableNameIf you're using .net 4.0, you can (and should) use
ReadLines instead of ReadAllLines as ReadAllLines reads the whole file into memory before you can start to iterate, while ReadLines loads the file lazily which will be faster for large files.Note that since
ReadLines returns a sequence, not an array, you can not access it using indices any longer. However this is a good thing as getting rid of the indices will lead to more functional code.// find which row the vertex is in
let vrow = Array.findIndex (fun (s:string) -> s.StartsWith vertexLiteral) tableData
let firstYear = int(tableData.[vrow].Split('\t').[1])
// filter out all the row prior to the column headers e.g. table description and comments
let filteredArray = Array.sub tableData (vrow+1) (tableData.Length-vrow-1)As I said, you can no longer use an index based approach here. The idiomatic way to get rid of certain elements at the beginning of a sequence is to use
skipWhile. Since the first element we want is the one that starts with vertexLiteral, we skip elements while they do not start with vertexLiteral:// Skip all rows up to the one the vertex is in
let relevantRows = tableData |> Seq.skipWhile (fun s -> not s.StartsWith vertexLiteral)(Note that by using
|> to write tableData first I allowed F# to infer the type of s and thus didn't need a type annotation.)Now we can use
Seq.head to get the first row of relevantRows (the one with the vertex) and Seq.skip 1 to get the remaining rows. So the following lines become:let firstYear = int((Seq.head relevantRows).Split('\t').[1])
let filteredRows = Seq.skip 1 relevantRows// use the vertex info to read all lines beyond that, converting to doubles
let f (s:string) = s.Split('\t') |> Array.map double
let fullArray = Array.map f filteredArrayThose lines are fine except that you now need to use
Seq.map instead of Array.map (at least on the first line, the second one may stay Array.map as Split still returns an array, but there's no harm in using Seq instead), f needs a better name and fullArray needs to be renamed because it's not an array anymore.[| for i in 0 .. (120 - startAge - 1) -> fullArray.[startAge + i - 1].[System.Math.Min(startYear - firstYear + i + 1, fullArray.[0].Length)] |]Ok, here 120 is a magic number, which you did not explain (which you should fix by documenting its meaning), so I'm not sure whether you know that the table will have exactly 120 elements and you have the number in there to avoid a bounds violation or whether the table can contain more than 120 elements and you only want to take the first 120. I'm going to assume the latter is the case.
Further it is not clear to me why you use
fullArray.[0].Length instead of fullArray.[startAge + i - 1]. I'm going to assume that all rows have the same length and you chose 0 over startAge + i - 1 for simplicity's sake.So what you're doing here is basically to skip the first
startAge - 2 elements then indexing into each remaining element using the minimum of its length and startYear - firstYear + i + 1 as the index. This can be achieved nicely without an index based loop by using Seq.skip followed by mapi (map with an index), like this:fullSequence |> Seq.skip (startAge - 2) |>
Seq.mapi (fun i cols -> cols.[System.Math.Min(startYear - firstYear + i + 1, cols.Length)])Since this is still a bit long, it might be worthwhile to factor out
fun i cols -> cols.[System.Math.Min(startYear - firstYear + i + 1, cols.Length)] into a named function.Code Snippets
let tableData = File.ReadAllLines tableName// find which row the vertex is in
let vrow = Array.findIndex (fun (s:string) -> s.StartsWith vertexLiteral) tableData
let firstYear = int(tableData.[vrow].Split('\t').[1])
// filter out all the row prior to the column headers e.g. table description and comments
let filteredArray = Array.sub tableData (vrow+1) (tableData.Length-vrow-1)// Skip all rows up to the one the vertex is in
let relevantRows = tableData |> Seq.skipWhile (fun s -> not s.StartsWith vertexLiteral)let firstYear = int((Seq.head relevantRows).Split('\t').[1])
let filteredRows = Seq.skip 1 relevantRows// use the vertex info to read all lines beyond that, converting to doubles
let f (s:string) = s.Split('\t') |> Array.map double
let fullArray = Array.map f filteredArrayContext
StackExchange Code Review Q#1362, answer score: 6
Revisions (0)
No revisions yet.