HiveBrain v1.2.0
Get Started
← Back to all entries
patternswiftMinor

Optimization of multiple arrays for filtering

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
arraysoptimizationformultiplefiltering

Problem

My app will be running on an iPad with A7. I pass in 3 JSON arrays and right now do some simple filtering, in the future the operation will be much more complicated. I realize the if statement is slowing down the process, but how can I avoid it? Is there a way to have this loop on one CPU and run the GUI on the other CPU (assign affinity)?

class func filterPhenotypeByPvalues(
    phenotypeJSON:[String],
    SNPJSON:[String],
    dataJSON:[Float],
    limit:Float) -> ([Float],[String],[String])
  {
    let dbg:Bool = true
    var PvalueFiltered:[Float] = []
    var PhenotypeFiltered:[String] = []
    var SNPFiltered:[String] = []
    //initialize arrays with value
    //[Float](count: data.count, repeatedValue: 0.0)

    var i:Int = 0

    let startTime = CFAbsoluteTimeGetCurrent()

    for (i = 0; i < dataJSON.count; i++) {
      if dataJSON[i] <= limit {
        PvalueFiltered.append(-log10(dataJSON[i]))
        PhenotypeFiltered.append(phenotypeJSON[i])
        SNPFiltered.append(SNPJSON[i])
      }
    }

    let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
    if(dbg){println("Time elapsed filterPhenotypeByPvalues: \(timeElapsed) s")}

    return (PvalueFiltered,PhenotypeFiltered,SNPFiltered)
  }


A very small sample of the JSON data file:

{
    "data": [
        ["rs6855911","Negative control - flare length",0.0000000002],
        ["rs6855911","BMD of intertrochanter region - gm/cm sq",0.0000000007],
        ["rs1501908","BMD of intertrochanter region - gm/cm sq",0.000000001]
    ]
}


Screenshot of timing profile:

Solution

There may exist a way to make this particular loop run faster. In fact, I'm sure there probably is... however, it's kind of a futile effort.

There's a lot of time also being wasted elsewhere where we're creating these original three arrays we're passing in here. We need to take a far more OOP approach to this problem.

I don't know what this code actually represents, but for now, I'm going to use "Phenotype" and assume that's an accurate noun to give to the object that the JSON data is trying to describe.

The way the JSON data is structured suggests that the these three data points are obviously tied to each other. We don't need to create three separate arrays and somehow rely on a common index among them to remember what goes with what. Let's make a container to hold all of this information:

struct Phenotype {
    var phenotype: String
    var SNP: String
    var data: String
}


(The struct name and its variable names are probably far from optimal.)

Now, when we parse out the JSON data, we create a single array of instances of our Phenotype struct rather than three arrays of the different types.

Now we don't even need to write the method in your question, because we can do it with the built-in filter method:

let limit = someValue
let dataFromJSON = someFuncCreatingArrayOfPhenotypesFromJSONData(data)

let filteredData = dataFromJSON.filter { $0.data <= limit }


And filteredData represents exactly the data your method was trying to extract.

Code Snippets

struct Phenotype {
    var phenotype: String
    var SNP: String
    var data: String
}
let limit = someValue
let dataFromJSON = someFuncCreatingArrayOfPhenotypesFromJSONData(data)

let filteredData = dataFromJSON.filter { $0.data <= limit }

Context

StackExchange Code Review Q#88920, answer score: 6

Revisions (0)

No revisions yet.