principlejavaMinor
Design Strategy of CSV Parser
Viewed 0 times
parserdesigncsvstrategy
Problem
I wanted to review my design strategy for CSV parser.
I have 4 CSV files, which have different layouts as shown below.
Each row of a CSV file will be mapped to a class. For example, if
I designed an interface like this.
And I have created 4 different parsers, which implement the interface and have a different logic of parsing the CSV file. The code below is a complete implementation for CSV_FILE_B. The other three parsers will have the same structures but different
```
public class csvBParser implements CSVParser {
@Override
public List parseCSVFile(String fileName) {
List list = new ArrayList();
try {
ClassLoader classLoader = Thread.currentThread()
.getContextClassLoader();
InputStream is = classLoader.getResourceAsStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = "";
StringTokenizer token = null;
int lineNum = 0, tokenNum = 0;
while ((line = br.readLine()) != null) {
if (lineNum == 0){
lineNum++;
continue;
}
token = new StringTokenizer(line, ",");
ObjectB objB = new ObjectB();
while (token.hasMoreTokens()) {
String nextToken = token.nextToken();
boolean isTokenNull = false;
if (nextToken.equalsIgnoreCas
I have 4 CSV files, which have different layouts as shown below.
Each row of a CSV file will be mapped to a class. For example, if
CSV_FILE_A has 10 rows, a list contains 10 objects of ObjA will be returned after calling parser function. For CSV_FILE_B of 5 entries, a list of 5 ObjB will be returned.CSV_FILE_A: LONG, STR, STR
CSV_FILE_B: LONG, LONG
CSV_FILE_C: LONG, LONG, STR, LONG
CSV_FILE_D: LONG, LONG, LONGI designed an interface like this.
public interface CSVParser {
public List parseCSVFile(String fileName);
}And I have created 4 different parsers, which implement the interface and have a different logic of parsing the CSV file. The code below is a complete implementation for CSV_FILE_B. The other three parsers will have the same structures but different
if-else statements for setting object fields and different object types in the type of List. ```
public class csvBParser implements CSVParser {
@Override
public List parseCSVFile(String fileName) {
List list = new ArrayList();
try {
ClassLoader classLoader = Thread.currentThread()
.getContextClassLoader();
InputStream is = classLoader.getResourceAsStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = "";
StringTokenizer token = null;
int lineNum = 0, tokenNum = 0;
while ((line = br.readLine()) != null) {
if (lineNum == 0){
lineNum++;
continue;
}
token = new StringTokenizer(line, ",");
ObjectB objB = new ObjectB();
while (token.hasMoreTokens()) {
String nextToken = token.nextToken();
boolean isTokenNull = false;
if (nextToken.equalsIgnoreCas
Solution
First, there would be much duplication between the parser classes. This kind of duplication should be avoided. If you find a bug in the way the csv is being parsed, you now have to fix it in multiple places. Consider using the template method pattern. In this scenario, you would extract the common logic into an abstract superclass with 1 or more abstract "template methods" that would be implemented by the subclasses. For instance, the abstract superclass could parse the fields into an array or strings, then pass the array to a method to construct an object using those fields.
In the parsing code, the
If you are using Java 7, you should check out the try-with-resources statement. The above example could be written like this:
In this example, the parser classes appear to be stateless. Given this,
If you don't want this, and still want to create the parser instances each time, you could use the same strategy with the class names instead of instances, then create the instance using reflection.
Finally, you may want to consider passing an
public abstract class AbstractCSVParser implements CSVParser {
public List parseCSVFile(String fileName) {
List list = new ArrayList();
// for each line...
String[] fields = // parse line into an array of strings
T obj = buildObject(fields);
list.add(obj);
return list;
}
protected abstract T buildObject(String[] fields);
}
public class csvBParser extends AbstractCSVParser {
protected ObjectB buildObject(String[] fields) {
ObjectB obj = new ObjectB();
// populate object with fields
return obj;
}
}In the parsing code, the
InputStream and BufferedReader are never closed. Make sure you close resources when you are done with them, and make sure it is happening inside a finally block.BufferedReader br = new BufferedReader(new InputStreamReader(is));
try {
// do work with br
}
finally {
br.close();
}If you are using Java 7, you should check out the try-with-resources statement. The above example could be written like this:
try (BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
// do work with br
} // close is called automatically at the end of the try blockIn this example, the parser classes appear to be stateless. Given this,
CSVParserFactory doesn't really need to return a new instance every time. One option to simplify CSVParserFactory is to initialize a static Map containing the parser instances.public class CSVParserFactory {
private static final Map PARSERS = new HashMap() {{
put(A, new csvAParser());
put(B, new csvBParser());
put(C, new csvCParser());
// etc
}};
public static CSVParser getParser(TableType type) {
CSVParser parser = PARSERS.get(type);
if (parser == null) {
throw new IllegalArgumentException("No such table type");
}
return parser;
}
}If you don't want this, and still want to create the parser instances each time, you could use the same strategy with the class names instead of instances, then create the instance using reflection.
Finally, you may want to consider passing an
InputStream to parseCSVFile instead of a file name. This would allow more flexibility of the source of the data. The current implementation is very specific as to the source.Code Snippets
public abstract class AbstractCSVParser<T> implements CSVParser<T> {
public List<T> parseCSVFile(String fileName) {
List<T> list = new ArrayList<T>();
// for each line...
String[] fields = // parse line into an array of strings
T obj = buildObject(fields);
list.add(obj);
return list;
}
protected abstract T buildObject(String[] fields);
}
public class csvBParser extends AbstractCSVParser<ObjectB> {
protected ObjectB buildObject(String[] fields) {
ObjectB obj = new ObjectB();
// populate object with fields
return obj;
}
}BufferedReader br = new BufferedReader(new InputStreamReader(is));
try {
// do work with br
}
finally {
br.close();
}try (BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
// do work with br
} // close is called automatically at the end of the try blockpublic class CSVParserFactory {
private static final Map<TableType, CSVParser> PARSERS = new HashMap<TableType, CSVParser>() {{
put(A, new csvAParser());
put(B, new csvBParser());
put(C, new csvCParser());
// etc
}};
public static CSVParser getParser(TableType type) {
CSVParser parser = PARSERS.get(type);
if (parser == null) {
throw new IllegalArgumentException("No such table type");
}
return parser;
}
}Context
StackExchange Code Review Q#25354, answer score: 3
Revisions (0)
No revisions yet.