HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Rosalind problem "Consensus and Profile"

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
problemconsensusprofileandrosalind

Problem

Source: Rosalind("Consensus and Profile")


Brief summary

A T C C A G C T
                G G G C A A C T
                A T G G A T C T
DNA Strings     A A G C A A C C
                T T G G A A C T
                A T G C C A T T
                A T G G C A C T

            A   5 1 0 0 5 5 0 0
Profile     C   0 0 1 4 2 0 6 1
            G   1 1 6 3 0 1 0 0
            T   1 5 0 0 0 1 1 6

Consensus       A T G C A A C T




Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.


Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

Model (cons.rb):

#!/usr/bin/env ruby
require_relative '../ie_module'

class DnaConsensus
  include ImportExport
  DNA_BASES = %w(A C G T)

  attr_reader :dna_strings, :consensus, :profile

  def initialize(source = "rosalind_#{current_dir_name}.txt")
    @dna_strings = (source =~ /txt$/ ? import_lines(source) : source).values
    @profile = build_profile
    @consensus = build_consensus
  end

  def to_s
    "#{consensus.join}\n#{stringify(profile)}"
  end

private

  def build_profile
    prof = DNA_BASES.map{|b| [b, []]}.to_h
    dna_strings.map(&:chars).transpose.each.with_object(prof) do |arr, hsh|
      hsh.merge!(hashed(arr)){ |_, oldval, newval| oldval << newval }
    end
  end

  def hashed(arr)
    hsh = arr.group_by(&:chr).map{ |k,v| [k, v.size] }.to_h
    (DNA_BASES - hsh.keys).each { |b| hsh[b] = 0 }
    hsh
  end

  def build_consensus
    dna_strings.first.length.times.with_object([]) do |index, arr|
      arr << profile.max_by{|_, list| list[index]}.first
    end
  end

end

a = DnaConsensus.new
a.export_to_file([a.to_s])


File read/write logic (ie_module.rb):

```
module ImportExport

def export_to_file(result, file = "result_#{current_dir_name}.txt")
File.open(file, 'w') do |f|
result.each{ |val| f /, '')
$' ? hsh[line] = '' : hsh[h

Solution

I am going to ignore all the file reading code, which is extraneous to the problem, and focus just on simplifying the code to find the consensus. It can be done in essentially a single a line, the one beginning consensus = .... Everything else is just setting up the sample data.

transpose gets us the columns, and max_by ... count get us the most frequently occurring nucleotide:

matrix =  ["A", "T", "G", "C", "A", "A", "C", "T"]

Code Snippets

matrix = <<EOS
A T C C A G C T
G G G C A A C T
A T G G A T C T
A A G C A A C C
T T G G A A C T
A T G C C A T T
A T G G C A C T
EOS
.split("\n").map{|x| x.split(' ')}

nucleotides = %w(A C G T)

consensus = matrix.transpose.map {|x| nucleotides.max_by {|n| x.count(n)}}

p consensus #=> ["A", "T", "G", "C", "A", "A", "C", "T"]

Context

StackExchange Code Review Q#115676, answer score: 4

Revisions (0)

No revisions yet.