patternMinor
Nucleotide count in Scala
Viewed 0 times
countnucleotidescala
Problem
This is my second day in learning Scala and I still need to develop a taste of functional programming, I often find myself doing imperative coding. Below is the result of my TDD practice.
Code
Test Suite
```
import org.scalatest._
class NucleotideCountSpecs extends FlatSpec with Matchers {
"empty dna string" should "have no adenine" in {
new DNA("").nucleotideCounts('A') should be (0)
}
it should "have no nucleotides" in {
val expected = Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 0)
new DNA("").nucleotideCounts should be (expected)
}
"a repetitive sequence" should "count cytosine" in {
new DNA("CCCCC").nucleotideCounts('C') should be (5)
}
it should "have only guanine" in {
val expected = Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 8)
new DNA("GGGGGGGG").nucleotideCounts should be (expected)
}
"a mixed dna string" should "count only thymine" in {
new DNA("GGGGGTAACCCGG").nucleotideCounts('T') should be (1)
}
it should "count a nucleotide only once" in {
val dna = new DNA("CGATTGGG")
dna.nucleotideCounts('T')
dna.nucleotideCounts('T') should be (2)
}
it should "not change counts after counting adenine" in {
val dna = new DNA("GATTACA")
dna.nucleotideCounts('A')
val expected = Map('A' -> 3, 'T' -> 2, 'C' -> 1, 'G' -> 1)
dna.nucleotideCounts should be (expected)
}
it should "validate nucleotides" in {
a [RuntimeException] should be thrownBy new DNA("GACT").nucleotideCounts('X')
}
it should
Code
class DNA(dnaSequence: String) {
var count = collection.mutable.Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 0)
dnaSequence.foreach { (i: Char) =>
validate(i)
count = count.updated(i, count.get(i).getOrElse(0) + 1)
}
def nucleotideCounts() = count
def nucleotideCounts(nucleotide: Char) = {
validate(nucleotide)
count.get(nucleotide).getOrElse(0)
}
private def validate(chr: Char) = {
if (!count.contains(chr)) throw new RuntimeException()
}
}Test Suite
```
import org.scalatest._
class NucleotideCountSpecs extends FlatSpec with Matchers {
"empty dna string" should "have no adenine" in {
new DNA("").nucleotideCounts('A') should be (0)
}
it should "have no nucleotides" in {
val expected = Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 0)
new DNA("").nucleotideCounts should be (expected)
}
"a repetitive sequence" should "count cytosine" in {
new DNA("CCCCC").nucleotideCounts('C') should be (5)
}
it should "have only guanine" in {
val expected = Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 8)
new DNA("GGGGGGGG").nucleotideCounts should be (expected)
}
"a mixed dna string" should "count only thymine" in {
new DNA("GGGGGTAACCCGG").nucleotideCounts('T') should be (1)
}
it should "count a nucleotide only once" in {
val dna = new DNA("CGATTGGG")
dna.nucleotideCounts('T')
dna.nucleotideCounts('T') should be (2)
}
it should "not change counts after counting adenine" in {
val dna = new DNA("GATTACA")
dna.nucleotideCounts('A')
val expected = Map('A' -> 3, 'T' -> 2, 'C' -> 1, 'G' -> 1)
dna.nucleotideCounts should be (expected)
}
it should "validate nucleotides" in {
a [RuntimeException] should be thrownBy new DNA("GACT").nucleotideCounts('X')
}
it should
Solution
Encapsulation and information hiding
The parameterless
If later you wanted to change the way you store the counts,
for example you wanted to store in a database instead of a map in memory,
you won't be able to do that,
because you did not hide this information.
Exposing the map is especially suspicious since the map is mutable.
Users of this class could circumvent the validation logic,
for example delete nucleotides or add invalid nucleotides.
In any case, it would be better to declare the
Unnecessary
Thanks to the
and if we assume that users of the parameterless
it's guaranteed that
so you can replace all occurrences of
Mutable or not
It's not clear why you use a mutable map.
Your current implementation and tests would work with an immutable map too.
If you really want to use a mutable map, then you can replace this:
with:
Naming
For example the plural
which would also match the naming of your parameterless
You used multiple names for a character in a DNA string:
The parameterless
nucleotideCounts method exposes the map of counts.If later you wanted to change the way you store the counts,
for example you wanted to store in a database instead of a map in memory,
you won't be able to do that,
because you did not hide this information.
Exposing the map is especially suspicious since the map is mutable.
Users of this class could circumvent the validation logic,
for example delete nucleotides or add invalid nucleotides.
In any case, it would be better to declare the
count map private.Unnecessary
getOrElseThanks to the
validate method,and if we assume that users of the parameterless
nucleotideCounts method will not abuse the class and delete nucleotides,it's guaranteed that
count.get(c).get will always find a value,so you can replace all occurrences of
count.get(c).getOrElse(0).Mutable or not
It's not clear why you use a mutable map.
Your current implementation and tests would work with an immutable map too.
If you really want to use a mutable map, then you can replace this:
count = count.updated(i, count.get(i).getOrElse(0) + 1)with:
count.update(c, count.get(c).get + 1)Naming
count is a not a great name for a collection type.For example the plural
counts would be a more fitting name,which would also match the naming of your parameterless
nucleotideCounts method.You used multiple names for a character in a DNA string:
i, chr, nucleotide. It would be easier to read the code if you used a consistent name. i is especially a bad name, as it's typically used in counting loops.Code Snippets
count = count.updated(i, count.get(i).getOrElse(0) + 1)count.update(c, count.get(c).get + 1)Context
StackExchange Code Review Q#132217, answer score: 3
Revisions (0)
No revisions yet.