HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

VCF parser for eventual genomic data visualization

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parsereventualvcfforgenomicvisualizationdata

Problem

I've just started out writing an app that will visualize genomic data for anybody to understand.

When you get your genome sequenced the raw data usually comes in the form of a VCF file. I started out by drafting parser below that essentially creates a JavaScript object out of a VCF text file. Next I think want to write a schema-like JSON file (since there doesn't seem to be any database with an API that correlate genotypes to phenotypes based on GWAS) that I can query for phenotypes with the genotypes generated by the parser. For example, if I want to know if a specific genome has the genotype for lactose-intolerance, red hair or alcoholism pre-disposition, I can consult this manually written schema that maps genotypes to phenotypes.

If you have genomics background all the better, but I'm mostly looking for advice concerning my usage of OOP and any other insights. How would you go about the more downstream front-end parts of the app (the actual visualization, was thinking something like a timeline and a 'humonculus'. React? D3?)?

```
(function iife() {

'use strict'

const fs = require('fs');

// Output class for parsing genotype file
class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) === 'c') {
for(var id in this[chr]) {
if (id == rs_id) {
return this[chr][rs_id]
}
}
}
}
}
// Get actual sample's genotype
this.getSampGenotype = (rs_id) => {
let rs = this.findByRsId(rs_id)
switch(rs.fit) {
case '0/0':
return rs['genotype'][0]+rs['genotype'][0]
break
case '0/1':
return rs.genotype
break
case '1/1':
return rs['genotype'][1]+rs['genotype'][1]
break
case '0':

Solution

Your code is good, but it could be improved:

Nesting

Try and avoid massive nesting like this, but reversing the conditions and continueing:

class Genome {
  constructor(name) {
      this.name = name
      // Find genotype by rs-ID without having to know chromosome
      this.findByRsId = (rs_id) => {
        for(var chr in this) {
          if (chr.substring(0, 1) === 'c') {
            for(var id in this[chr]) {
              if (id == rs_id) {
                return this[chr][rs_id]
              }
            }
          }
        }
      }


Into something a little less nested:

class Genome {
  constructor(name) {
      this.name = name
      // Find genotype by rs-ID without having to know chromosome
      this.findByRsId = (rs_id) => {
        for(var chr in this) {
          if (chr.substring(0, 1) !== 'c') continue;
          for(var id in this[chr]) {
            if (id != rs_id) continue;
            return this[chr][rs_id]
          }
        }
      }


Consistency

This is the only semi-colon you use in your entire script:

const fs = require('fs');


You should keep your consistency in check, and either do or don't use semicolons (I would suggest the former), but keep consistent.

Naming

You're shouldn't be sacrificing readability in your code just to save space.

  • getSampGenotype: Literally two characters missing to massively improve readability



  • handleGFGChunk: This could be improved, however I would assume those working your code would know what it means, I, on the other hand, don't.



What is rs??!??

You use the abbreviation/term rs everywhere in your code, but to anyone without a doctorate, this might not be clear.

switch it to better code:

Excuse the horrible pun there, but...

Your switch statement can be simplified:

case '0':
        return rs.genotype
        break
      case '1':
        return rs.genotype
        break
      default:
        return rs.genotype
        break


Here, you're defining multiple cases that have the same result.

There's two ways to approach this problem, while the first is better for readability, it's redundant in this case not to use the second

switch statements use fall throughs, which means, if you don't explictly declare break, it will keep falling down and down.

Take this snippet for example:



document.getElementById("testbutton").addEventListener('click', function(){
var sample = "ACTA",
result = "";
switch(sample){
case "ACTA":
result += "A";
case "AATC":
result += "B";
case "UUCA":
result += "C";
case "FAIL":
result += "D";
default:
result += "E";
}
document.getElementById("content").innerText = result;
});

Test




What do you think would print?


You would expect that to print A, because it's the only case that matches, but because it falls through, it catches all of them and alerts ABCDE.

Change sample to UUCA and watch what happens:


CDE is alerted.

Number 1: So, seeing as content inside a case is optional, as shown above, you can "merge" cases, like the following:

case '0': case '1': default:
        return rs.genotype
        break


However, Number 2: default is a catch-all of sorts, anything that isn't declared as a case above is caught by default, so not declaring case '0' and case '1' is the same as declaring them in this case.

Depending on how you feel about readability, you could make it look like:

case '0': case '1': default:
        return rs.genotype
        break


Which, while redundant, it readably shows that cases 0, 1 have been considered and their result determined.

Indentation:

Your block at the moment, looks like this:

(function iife() {

'use strict'
class Genome {
  constructor(name) {
  }
}
})()


I see two things wrong with this:

  • After your IIFE declaration, there isn't an added indentation



  • You're using two-space indentation, when you should be using four-space indentation.



Duplication:

You have a bit of duplication in this code block:

if(!myGenotypes["chr_"+row[0]]) {
  myGenotypes["chr_"+row[0]] = {}
} else {
  // If no rs_[num] exists, make one and set it to genotype
  if (!myGenotypes["chr_"+row[0]][row[2]]) {
    myGenotypes["chr_"+row[0]][row[2]] = {
      'genotype': row[3]+row[4],
      'quality': row[6],
      'fit': row[9]
    }
  }
}


See how many times you needed to call "chr_" +row[0]? That's how you know you deserve a variable.

var propertyName = "chr_" + row[0]
if(!myGenotypes[propertyName]) {
  myGenotypes[propertyName] = {}
} else {
  // If no rs_[num] exists, make one and set it to genotype
  var subPropertyName = row[2]
  if (!myGenotypes[propertyName][subPropertyName]) {
    myGenotypes[propertyName][subPropertyName] = {
      'genotype': row[3] + row[4],
      'quality': row[6],
      'fit': row[9]
    }
  }
}


Magic numbe

Code Snippets

class Genome {
  constructor(name) {
      this.name = name
      // Find genotype by rs-ID without having to know chromosome
      this.findByRsId = (rs_id) => {
        for(var chr in this) {
          if (chr.substring(0, 1) === 'c') {
            for(var id in this[chr]) {
              if (id == rs_id) {
                return this[chr][rs_id]
              }
            }
          }
        }
      }
class Genome {
  constructor(name) {
      this.name = name
      // Find genotype by rs-ID without having to know chromosome
      this.findByRsId = (rs_id) => {
        for(var chr in this) {
          if (chr.substring(0, 1) !== 'c') continue;
          for(var id in this[chr]) {
            if (id != rs_id) continue;
            return this[chr][rs_id]
          }
        }
      }
const fs = require('fs');
case '0':
        return rs.genotype
        break
      case '1':
        return rs.genotype
        break
      default:
        return rs.genotype
        break
case '0': case '1': default:
        return rs.genotype
        break

Context

StackExchange Code Review Q#116512, answer score: 3

Revisions (0)

No revisions yet.