HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

A Java class for reading MaCH dosage files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
readingmachjavafilesdosageforclass

Problem

A dosage file (used in computational genetics) is formatted like this:

// ID    TAG  geno1 geno2 geno3 ...
1->76016 DOSE 1.871 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
2->76018 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
3->76063 DOSE 1.877 1.832 1.893 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
4->76015 DOSE 1.877 1.832 1.897 1.994 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
5->76023 DOSE 1.877 1.832 1.897 1.995 1.885 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
6->76030 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
7->76044 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.407 1.853 1.998 1.885 1.390 1.971 1.890 1.699
8->76008 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.858 1.998 1.885 1.390 1.971 1.890 1.699
9->76014 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.999 1.885 1.390 1.971 1.890 1.699
10->76011 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.880 1.390 1.971 1.890 1.699


Here is a class for reading this kind of file:

Dosage

```
import java.io.*;
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.zip.GZIPInputStream;

/**
* Created by vu.co.kaiyin.ReadMachDosage on 15/11/14.
*/
public class Dosage {
final String filename;
final Pattern splitPattern;
final int nColFile;
final int nCol;
final int nRow;
public Dosage(String filename) throws Exception {
this.filename = filename;
splitPattern = Pattern.compile("\\s+");
int[] dimArray = dimensions();
nColFile = dimArray[0];
nCol = dimArray[1];
nRow = dimArray[2];
}

private int[] dimensions() throws Exception {
int nColFileCounter = 0;
int nRowCounter = 0;
try(
InputStream fileStream = new FileInputStream(filename);
InputStream gzipStream =

Solution

It's a very bad practice when a method is declared with throws Exception. The purpose of throwing exceptions is to signal the caller that something went wrong, and give it a chance to recover gracefully. The more generic the exception,
the less it helps the caller to recover.

The code opens and unzips the file twice, in different methods.
This is bad for many reasons:

  • Waste of processing



  • The code to open the file (input stream, buffered input stream, unzipping) is duplicated, which goes against the DRY (don't repeat yourself) principle



The collaborating classes are not well designed:

  • Dosage: read dosage from file, having properties like filename, splitPattern



  • PrintArray: utility class with a print method to print a matrix



This is incoherent design, with not much logic between the elements.
I suggest to rethink the classes and their responsibilities and properties.
How about something like this:

  • DosageMatrix: a simple object to contain a dosage matrix.



  • May have a print method to print the matrix in a nice format



  • May have a createFromZipFile factory method that can process a zip file and create a DosageMatrix instance. The method could take startCol, endCol parameters



When printing a double[][] matrix,
you can use a for-each instead of old fashioned for (;;).
Instead of this:

for(int i=0; i<nrow; i++) {
    for(int j=0; j<ncol; j++) {
        sb.append(String.format("%f\t", matrix[i][j]));
    }


Something like this, without the tedious i, j loop index variables:

for (double[] row : matrix) {
    for(double value : row) {
        sb.append(String.format("%f\t", value));
    }


I don't know how important is the \t separator to you.
If you don't mind using , instead,
then you could further simplify by using Arrays.toString and eliminate a nested loop:

for (double[] row : matrix) {
    sb.append(Arrays.toString(row));

Code Snippets

for(int i=0; i<nrow; i++) {
    for(int j=0; j<ncol; j++) {
        sb.append(String.format("%f\t", matrix[i][j]));
    }
for (double[] row : matrix) {
    for(double value : row) {
        sb.append(String.format("%f\t", value));
    }
for (double[] row : matrix) {
    sb.append(Arrays.toString(row));

Context

StackExchange Code Review Q#69980, answer score: 2

Revisions (0)

No revisions yet.