patternjavaMinor
A Java class for reading MaCH dosage files
Viewed 0 times
readingmachjavafilesdosageforclass
Problem
A dosage file (used in computational genetics) is formatted like this:
Here is a class for reading this kind of file:
```
import java.io.*;
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.zip.GZIPInputStream;
/**
* Created by vu.co.kaiyin.ReadMachDosage on 15/11/14.
*/
public class Dosage {
final String filename;
final Pattern splitPattern;
final int nColFile;
final int nCol;
final int nRow;
public Dosage(String filename) throws Exception {
this.filename = filename;
splitPattern = Pattern.compile("\\s+");
int[] dimArray = dimensions();
nColFile = dimArray[0];
nCol = dimArray[1];
nRow = dimArray[2];
}
private int[] dimensions() throws Exception {
int nColFileCounter = 0;
int nRowCounter = 0;
try(
InputStream fileStream = new FileInputStream(filename);
InputStream gzipStream =
// ID TAG geno1 geno2 geno3 ...
1->76016 DOSE 1.871 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
2->76018 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
3->76063 DOSE 1.877 1.832 1.893 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
4->76015 DOSE 1.877 1.832 1.897 1.994 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
5->76023 DOSE 1.877 1.832 1.897 1.995 1.885 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
6->76030 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.885 1.390 1.971 1.890 1.699
7->76044 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.407 1.853 1.998 1.885 1.390 1.971 1.890 1.699
8->76008 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.858 1.998 1.885 1.390 1.971 1.890 1.699
9->76014 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.999 1.885 1.390 1.971 1.890 1.699
10->76011 DOSE 1.877 1.832 1.897 1.995 1.884 1.856 1.405 1.853 1.998 1.880 1.390 1.971 1.890 1.699Here is a class for reading this kind of file:
Dosage```
import java.io.*;
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.zip.GZIPInputStream;
/**
* Created by vu.co.kaiyin.ReadMachDosage on 15/11/14.
*/
public class Dosage {
final String filename;
final Pattern splitPattern;
final int nColFile;
final int nCol;
final int nRow;
public Dosage(String filename) throws Exception {
this.filename = filename;
splitPattern = Pattern.compile("\\s+");
int[] dimArray = dimensions();
nColFile = dimArray[0];
nCol = dimArray[1];
nRow = dimArray[2];
}
private int[] dimensions() throws Exception {
int nColFileCounter = 0;
int nRowCounter = 0;
try(
InputStream fileStream = new FileInputStream(filename);
InputStream gzipStream =
Solution
It's a very bad practice when a method is declared with
the less it helps the caller to recover.
The code opens and unzips the file twice, in different methods.
This is bad for many reasons:
The collaborating classes are not well designed:
This is incoherent design, with not much logic between the elements.
I suggest to rethink the classes and their responsibilities and properties.
How about something like this:
When printing a
you can use a for-each instead of old fashioned
Instead of this:
Something like this, without the tedious
I don't know how important is the
If you don't mind using
then you could further simplify by using
throws Exception. The purpose of throwing exceptions is to signal the caller that something went wrong, and give it a chance to recover gracefully. The more generic the exception,the less it helps the caller to recover.
The code opens and unzips the file twice, in different methods.
This is bad for many reasons:
- Waste of processing
- The code to open the file (input stream, buffered input stream, unzipping) is duplicated, which goes against the DRY (don't repeat yourself) principle
The collaborating classes are not well designed:
Dosage: read dosage from file, having properties likefilename,splitPattern
PrintArray: utility class with aprintmethod to print a matrix
This is incoherent design, with not much logic between the elements.
I suggest to rethink the classes and their responsibilities and properties.
How about something like this:
DosageMatrix: a simple object to contain a dosage matrix.
- May have a
printmethod to print the matrix in a nice format
- May have a
createFromZipFilefactory method that can process a zip file and create aDosageMatrixinstance. The method could takestartCol, endColparameters
When printing a
double[][] matrix,you can use a for-each instead of old fashioned
for (;;).Instead of this:
for(int i=0; i<nrow; i++) {
for(int j=0; j<ncol; j++) {
sb.append(String.format("%f\t", matrix[i][j]));
}Something like this, without the tedious
i, j loop index variables:for (double[] row : matrix) {
for(double value : row) {
sb.append(String.format("%f\t", value));
}I don't know how important is the
\t separator to you.If you don't mind using
, instead,then you could further simplify by using
Arrays.toString and eliminate a nested loop:for (double[] row : matrix) {
sb.append(Arrays.toString(row));Code Snippets
for(int i=0; i<nrow; i++) {
for(int j=0; j<ncol; j++) {
sb.append(String.format("%f\t", matrix[i][j]));
}for (double[] row : matrix) {
for(double value : row) {
sb.append(String.format("%f\t", value));
}for (double[] row : matrix) {
sb.append(Arrays.toString(row));Context
StackExchange Code Review Q#69980, answer score: 2
Revisions (0)
No revisions yet.