HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Prefix Sum in Ruby, Genomic Range Query from Codility

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sumgenomicrangequerycodilityrubyprefixfrom

Problem

I'm currently going through some lessons on Codility. I've just spent a couple of hours with GenomicRangeQuery, which is intended to demonstrate the use of prefix sums.

The task description is here. Broadly, the idea is that given a string and a set of arbitrary cut points one should be able to return the lowest value character in a given slice. O(N*M) complexity is what we want to avoid.

My solution scores 100%, but I would still value any feedback on style and readability.

def solution(str, slice_starts, slice_ends)
# Initiate an array to hold rolling totals of character occurrences
prefix_sums = [0,0,0,0]]
# Map possible characters to their positions in the prefix sum array
chars = { "A" => 0, "C" => 1, "G" => 2, "T" => 3 }

str.split('').each_with_index do |char, i|
prefix_sums[i + 1] = prefix_sums[i].clone
prefix_sums[i + 1][chars[char]] += 1
end

slice_starts.each_with_index.map do |slice_start, i|
s = slice_start
e = slice_ends[i]

chars.each do |char, ii|
occurrences = prefix_sums[e + 1][ii] - prefix_sums[s][ii]
break (ii + 1) if occurrences > 0
end
end
end


Updated

Here's my current preferred version using tips from some of the answers below. I should probably also use an array of hashes, rather than arrays, for the prefix sums, but I'm lazy.

def solution(str, slice_starts, slice_ends)
  prefix_sums = [[0] * 4]
  chars = { "A" => 0, "C" => 1, "G" => 2, "T" => 3 }

  str.chars.each_with_index do |char, i|
      prefix_sums[i + 1] = prefix_sums[i].clone
      prefix_sums[i + 1][chars[char]] += 1
  end

  slice_starts.zip(slice_ends).map do |s, e|
    chars.each do |char, ii|
      occurrences = prefix_sums[e + 1][ii] - prefix_sums[s][ii]
      break (ii + 1) if occurrences > 0
    end
  end
end

Solution

Some notes:

  • Use 2-space indentation.



  • [[0,0,0,0]]. Always a space after a comma.



  • str.split('') - >str.chars



  • prefix_sums = [[0,0,0,0]]. This kind of implicit structures hinder readability, I'd simply use a hash. A small space penalty (not in big-Oh, though), better readibility.



  • each_with_index. Use zip instead.



  • As you say, the problem is about getting partial sums (frequencies). You can implement and use a very generic abstraction (Enumerable#scanl), which can be used in more codility problems.



I'd write this purely functional solution:

def solution(input_str, slice_start, slice_end)
  impact_factors = {"A" => 1, "C" => 2, "G" => 3, "T" => 4}
  frequencies = input_str.chars.scanl(Hash.new(0)) do |freq, n|
    freq.merge(n => freq[n] + 1)
  end

  slice_start.zip(slice_end).map do |from, to|
    difference_count_by_factor = frequencies[to+1].map do |n, count| 
      [impact_factors[n], count - frequencies[from][nucleotide]]
    end.to_h
    difference_count_by_factor.reject { |factor, count| count.zero? }.keys.min
  end
end

Code Snippets

def solution(input_str, slice_start, slice_end)
  impact_factors = {"A" => 1, "C" => 2, "G" => 3, "T" => 4}
  frequencies = input_str.chars.scanl(Hash.new(0)) do |freq, n|
    freq.merge(n => freq[n] + 1)
  end

  slice_start.zip(slice_end).map do |from, to|
    difference_count_by_factor = frequencies[to+1].map do |n, count| 
      [impact_factors[n], count - frequencies[from][nucleotide]]
    end.to_h
    difference_count_by_factor.reject { |factor, count| count.zero? }.keys.min
  end
end

Context

StackExchange Code Review Q#101206, answer score: 2

Revisions (0)

No revisions yet.