HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Nucleotide count in Scala

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countnucleotidescala

Problem

This is my second day in learning Scala and I still need to develop a taste of functional programming, I often find myself doing imperative coding. Below is the result of my TDD practice.

Code

class DNA(dnaSequence: String) {
  var count = collection.mutable.Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 0)

  dnaSequence.foreach { (i: Char) =>
    validate(i)
    count = count.updated(i, count.get(i).getOrElse(0) + 1)
  }

  def nucleotideCounts() = count

  def nucleotideCounts(nucleotide: Char) = {
    validate(nucleotide)
    count.get(nucleotide).getOrElse(0)
  }

  private def validate(chr: Char) = {
    if (!count.contains(chr)) throw new RuntimeException()
  }
}


Test Suite

```
import org.scalatest._

class NucleotideCountSpecs extends FlatSpec with Matchers {
"empty dna string" should "have no adenine" in {
new DNA("").nucleotideCounts('A') should be (0)
}

it should "have no nucleotides" in {
val expected = Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 0)
new DNA("").nucleotideCounts should be (expected)
}

"a repetitive sequence" should "count cytosine" in {
new DNA("CCCCC").nucleotideCounts('C') should be (5)
}

it should "have only guanine" in {
val expected = Map('A' -> 0, 'T' -> 0, 'C' -> 0, 'G' -> 8)
new DNA("GGGGGGGG").nucleotideCounts should be (expected)
}

"a mixed dna string" should "count only thymine" in {
new DNA("GGGGGTAACCCGG").nucleotideCounts('T') should be (1)
}

it should "count a nucleotide only once" in {
val dna = new DNA("CGATTGGG")
dna.nucleotideCounts('T')
dna.nucleotideCounts('T') should be (2)
}

it should "not change counts after counting adenine" in {
val dna = new DNA("GATTACA")
dna.nucleotideCounts('A')
val expected = Map('A' -> 3, 'T' -> 2, 'C' -> 1, 'G' -> 1)
dna.nucleotideCounts should be (expected)
}

it should "validate nucleotides" in {
a [RuntimeException] should be thrownBy new DNA("GACT").nucleotideCounts('X')
}

it should

Solution

Encapsulation and information hiding

The parameterless nucleotideCounts method exposes the map of counts.
If later you wanted to change the way you store the counts,
for example you wanted to store in a database instead of a map in memory,
you won't be able to do that,
because you did not hide this information.

Exposing the map is especially suspicious since the map is mutable.
Users of this class could circumvent the validation logic,
for example delete nucleotides or add invalid nucleotides.

In any case, it would be better to declare the count map private.

Unnecessary getOrElse

Thanks to the validate method,
and if we assume that users of the parameterless nucleotideCounts method will not abuse the class and delete nucleotides,
it's guaranteed that count.get(c).get will always find a value,
so you can replace all occurrences of count.get(c).getOrElse(0).

Mutable or not

It's not clear why you use a mutable map.
Your current implementation and tests would work with an immutable map too.

If you really want to use a mutable map, then you can replace this:

count = count.updated(i, count.get(i).getOrElse(0) + 1)


with:

count.update(c, count.get(c).get + 1)


Naming

count is a not a great name for a collection type.
For example the plural counts would be a more fitting name,
which would also match the naming of your parameterless nucleotideCounts method.

You used multiple names for a character in a DNA string: i, chr, nucleotide. It would be easier to read the code if you used a consistent name. i is especially a bad name, as it's typically used in counting loops.

Code Snippets

count = count.updated(i, count.get(i).getOrElse(0) + 1)
count.update(c, count.get(c).get + 1)

Context

StackExchange Code Review Q#132217, answer score: 3

Revisions (0)

No revisions yet.