patternrubyMinor
Rosalind problem "Consensus and Profile"
Viewed 0 times
problemconsensusprofileandrosalind
Problem
Source: Rosalind("Consensus and Profile")
Brief summary
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
Model (
File read/write logic (
```
module ImportExport
def export_to_file(result, file = "result_#{current_dir_name}.txt")
File.open(file, 'w') do |f|
result.each{ |val| f /, '')
$' ? hsh[line] = '' : hsh[h
Brief summary
A T C C A G C T
G G G C A A C T
A T G G A T C T
DNA Strings A A G C A A C C
T T G G A A C T
A T G C C A T T
A T G G C A C T
A 5 1 0 0 5 5 0 0
Profile C 0 0 1 4 2 0 6 1
G 1 1 6 3 0 1 0 0
T 1 5 0 0 0 1 1 6
Consensus A T G C A A C TGiven: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
Model (
cons.rb):#!/usr/bin/env ruby
require_relative '../ie_module'
class DnaConsensus
include ImportExport
DNA_BASES = %w(A C G T)
attr_reader :dna_strings, :consensus, :profile
def initialize(source = "rosalind_#{current_dir_name}.txt")
@dna_strings = (source =~ /txt$/ ? import_lines(source) : source).values
@profile = build_profile
@consensus = build_consensus
end
def to_s
"#{consensus.join}\n#{stringify(profile)}"
end
private
def build_profile
prof = DNA_BASES.map{|b| [b, []]}.to_h
dna_strings.map(&:chars).transpose.each.with_object(prof) do |arr, hsh|
hsh.merge!(hashed(arr)){ |_, oldval, newval| oldval << newval }
end
end
def hashed(arr)
hsh = arr.group_by(&:chr).map{ |k,v| [k, v.size] }.to_h
(DNA_BASES - hsh.keys).each { |b| hsh[b] = 0 }
hsh
end
def build_consensus
dna_strings.first.length.times.with_object([]) do |index, arr|
arr << profile.max_by{|_, list| list[index]}.first
end
end
end
a = DnaConsensus.new
a.export_to_file([a.to_s])File read/write logic (
ie_module.rb):```
module ImportExport
def export_to_file(result, file = "result_#{current_dir_name}.txt")
File.open(file, 'w') do |f|
result.each{ |val| f /, '')
$' ? hsh[line] = '' : hsh[h
Solution
I am going to ignore all the file reading code, which is extraneous to the problem, and focus just on simplifying the code to find the consensus. It can be done in essentially a single a line, the one beginning
consensus = .... Everything else is just setting up the sample data.transpose gets us the columns, and max_by ... count get us the most frequently occurring nucleotide:matrix = ["A", "T", "G", "C", "A", "A", "C", "T"]Code Snippets
matrix = <<EOS
A T C C A G C T
G G G C A A C T
A T G G A T C T
A A G C A A C C
T T G G A A C T
A T G C C A T T
A T G G C A C T
EOS
.split("\n").map{|x| x.split(' ')}
nucleotides = %w(A C G T)
consensus = matrix.transpose.map {|x| nucleotides.max_by {|n| x.count(n)}}
p consensus #=> ["A", "T", "G", "C", "A", "A", "C", "T"]Context
StackExchange Code Review Q#115676, answer score: 4
Revisions (0)
No revisions yet.