patternjavascriptMinor
VCF parser for eventual genomic data visualization
Viewed 0 times
parsereventualvcfforgenomicvisualizationdata
Problem
I've just started out writing an app that will visualize genomic data for anybody to understand.
When you get your genome sequenced the raw data usually comes in the form of a VCF file. I started out by drafting parser below that essentially creates a JavaScript object out of a VCF text file. Next I think want to write a schema-like JSON file (since there doesn't seem to be any database with an API that correlate genotypes to phenotypes based on GWAS) that I can query for phenotypes with the genotypes generated by the parser. For example, if I want to know if a specific genome has the genotype for lactose-intolerance, red hair or alcoholism pre-disposition, I can consult this manually written schema that maps genotypes to phenotypes.
If you have genomics background all the better, but I'm mostly looking for advice concerning my usage of OOP and any other insights. How would you go about the more downstream front-end parts of the app (the actual visualization, was thinking something like a timeline and a 'humonculus'. React? D3?)?
```
(function iife() {
'use strict'
const fs = require('fs');
// Output class for parsing genotype file
class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) === 'c') {
for(var id in this[chr]) {
if (id == rs_id) {
return this[chr][rs_id]
}
}
}
}
}
// Get actual sample's genotype
this.getSampGenotype = (rs_id) => {
let rs = this.findByRsId(rs_id)
switch(rs.fit) {
case '0/0':
return rs['genotype'][0]+rs['genotype'][0]
break
case '0/1':
return rs.genotype
break
case '1/1':
return rs['genotype'][1]+rs['genotype'][1]
break
case '0':
When you get your genome sequenced the raw data usually comes in the form of a VCF file. I started out by drafting parser below that essentially creates a JavaScript object out of a VCF text file. Next I think want to write a schema-like JSON file (since there doesn't seem to be any database with an API that correlate genotypes to phenotypes based on GWAS) that I can query for phenotypes with the genotypes generated by the parser. For example, if I want to know if a specific genome has the genotype for lactose-intolerance, red hair or alcoholism pre-disposition, I can consult this manually written schema that maps genotypes to phenotypes.
If you have genomics background all the better, but I'm mostly looking for advice concerning my usage of OOP and any other insights. How would you go about the more downstream front-end parts of the app (the actual visualization, was thinking something like a timeline and a 'humonculus'. React? D3?)?
```
(function iife() {
'use strict'
const fs = require('fs');
// Output class for parsing genotype file
class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) === 'c') {
for(var id in this[chr]) {
if (id == rs_id) {
return this[chr][rs_id]
}
}
}
}
}
// Get actual sample's genotype
this.getSampGenotype = (rs_id) => {
let rs = this.findByRsId(rs_id)
switch(rs.fit) {
case '0/0':
return rs['genotype'][0]+rs['genotype'][0]
break
case '0/1':
return rs.genotype
break
case '1/1':
return rs['genotype'][1]+rs['genotype'][1]
break
case '0':
Solution
Your code is good, but it could be improved:
Nesting
Try and avoid massive nesting like this, but reversing the conditions and
Into something a little less nested:
Consistency
This is the only semi-colon you use in your entire script:
You should keep your consistency in check, and either do or don't use semicolons (I would suggest the former), but keep consistent.
Naming
You're shouldn't be sacrificing readability in your code just to save space.
What is
You use the abbreviation/term
Excuse the horrible pun there, but...
Your switch statement can be simplified:
Here, you're defining multiple cases that have the same result.
There's two ways to approach this problem, while the first is better for readability, it's redundant in this case not to use the second
Take this snippet for example:
What do you think would print?
You would expect that to print
Change
Number 1: So, seeing as content inside a
However, Number 2:
Depending on how you feel about readability, you could make it look like:
Which, while redundant, it readably shows that cases
Indentation:
Your block at the moment, looks like this:
I see two things wrong with this:
Duplication:
You have a bit of duplication in this code block:
See how many times you needed to call
Magic numbe
Nesting
Try and avoid massive nesting like this, but reversing the conditions and
continueing:class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) === 'c') {
for(var id in this[chr]) {
if (id == rs_id) {
return this[chr][rs_id]
}
}
}
}
}Into something a little less nested:
class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) !== 'c') continue;
for(var id in this[chr]) {
if (id != rs_id) continue;
return this[chr][rs_id]
}
}
}Consistency
This is the only semi-colon you use in your entire script:
const fs = require('fs');You should keep your consistency in check, and either do or don't use semicolons (I would suggest the former), but keep consistent.
Naming
You're shouldn't be sacrificing readability in your code just to save space.
getSampGenotype: Literally two characters missing to massively improve readability
handleGFGChunk: This could be improved, however I would assume those working your code would know what it means, I, on the other hand, don't.
What is
rs??!??You use the abbreviation/term
rs everywhere in your code, but to anyone without a doctorate, this might not be clear.switch it to better code:Excuse the horrible pun there, but...
Your switch statement can be simplified:
case '0':
return rs.genotype
break
case '1':
return rs.genotype
break
default:
return rs.genotype
breakHere, you're defining multiple cases that have the same result.
There's two ways to approach this problem, while the first is better for readability, it's redundant in this case not to use the second
switch statements use fall throughs, which means, if you don't explictly declare break, it will keep falling down and down.Take this snippet for example:
document.getElementById("testbutton").addEventListener('click', function(){
var sample = "ACTA",
result = "";
switch(sample){
case "ACTA":
result += "A";
case "AATC":
result += "B";
case "UUCA":
result += "C";
case "FAIL":
result += "D";
default:
result += "E";
}
document.getElementById("content").innerText = result;
});
Test
What do you think would print?
You would expect that to print
A, because it's the only case that matches, but because it falls through, it catches all of them and alerts ABCDE.Change
sample to UUCA and watch what happens:CDE is alerted.Number 1: So, seeing as content inside a
case is optional, as shown above, you can "merge" cases, like the following:case '0': case '1': default:
return rs.genotype
breakHowever, Number 2:
default is a catch-all of sorts, anything that isn't declared as a case above is caught by default, so not declaring case '0' and case '1' is the same as declaring them in this case.Depending on how you feel about readability, you could make it look like:
case '0': case '1': default:
return rs.genotype
breakWhich, while redundant, it readably shows that cases
0, 1 have been considered and their result determined.Indentation:
Your block at the moment, looks like this:
(function iife() {
'use strict'
class Genome {
constructor(name) {
}
}
})()I see two things wrong with this:
- After your IIFE declaration, there isn't an added indentation
- You're using two-space indentation, when you should be using four-space indentation.
Duplication:
You have a bit of duplication in this code block:
if(!myGenotypes["chr_"+row[0]]) {
myGenotypes["chr_"+row[0]] = {}
} else {
// If no rs_[num] exists, make one and set it to genotype
if (!myGenotypes["chr_"+row[0]][row[2]]) {
myGenotypes["chr_"+row[0]][row[2]] = {
'genotype': row[3]+row[4],
'quality': row[6],
'fit': row[9]
}
}
}See how many times you needed to call
"chr_" +row[0]? That's how you know you deserve a variable.var propertyName = "chr_" + row[0]
if(!myGenotypes[propertyName]) {
myGenotypes[propertyName] = {}
} else {
// If no rs_[num] exists, make one and set it to genotype
var subPropertyName = row[2]
if (!myGenotypes[propertyName][subPropertyName]) {
myGenotypes[propertyName][subPropertyName] = {
'genotype': row[3] + row[4],
'quality': row[6],
'fit': row[9]
}
}
}Magic numbe
Code Snippets
class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) === 'c') {
for(var id in this[chr]) {
if (id == rs_id) {
return this[chr][rs_id]
}
}
}
}
}class Genome {
constructor(name) {
this.name = name
// Find genotype by rs-ID without having to know chromosome
this.findByRsId = (rs_id) => {
for(var chr in this) {
if (chr.substring(0, 1) !== 'c') continue;
for(var id in this[chr]) {
if (id != rs_id) continue;
return this[chr][rs_id]
}
}
}const fs = require('fs');case '0':
return rs.genotype
break
case '1':
return rs.genotype
break
default:
return rs.genotype
breakcase '0': case '1': default:
return rs.genotype
breakContext
StackExchange Code Review Q#116512, answer score: 3
Revisions (0)
No revisions yet.