patternjavaMinor
Data type detection for a CSV file parser
Viewed 0 times
fileparsercsvtypefordetectiondata
Problem
I am building a CSV file parser, and in order to get the appropriate Object to represent different data types found on the parsed files, I wrote the following function:
In order to call the appropriate 'casting' methods (
Hence, the pattern value gets used for routing to the appropriate function, and also as an argument in that function call.
To make this method logic 'flatter', I could avoid passing the pattern as an argument, and calling the pattern detection method (
So, is this the best way this method can be written without getting to call the detection methods twice?
EDIT
Here are the detectDate and stringToDate methods, as requested (the methods for BigInteger and BigDecimal are very similar). Please let me know if I need to post anything else:
```
private static String detectDate(String dateStrin
public static Object stringToDataType(String valueAsString) throws ParseException{
// detections ordered by probability of occurrence in Buffer_Bank.
String decimalPattern = detectDecimal(valueAsString);
if(decimalPattern != null){
return stringToBigDecimal(valueAsString, decimalPattern);
}else{
String integerPattern = detectInteger(valueAsString);
if(integerPattern != null){
return stringToBigInteger(valueAsString);
}else{
String datePattern = detectDate(valueAsString);
if(datePattern != null){
return stringToDate(valueAsString, datePattern);
}else{
// value is a String... nothing else to do!
return valueAsString;
}
}
}
}In order to call the appropriate 'casting' methods (
stringToDate, stringToBigInteger, stringToBigDecimal), I need to use the specific pattern string that gets returned when detecting if a particular value is of that type.Hence, the pattern value gets used for routing to the appropriate function, and also as an argument in that function call.
To make this method logic 'flatter', I could avoid passing the pattern as an argument, and calling the pattern detection method (
detectDecimal, detectInteger, detectDate) again from within the 'casting' method, but that would lead to parsing the same value two times, which seems wasteful at best. So, is this the best way this method can be written without getting to call the detection methods twice?
EDIT
Here are the detectDate and stringToDate methods, as requested (the methods for BigInteger and BigDecimal are very similar). Please let me know if I need to post anything else:
```
private static String detectDate(String dateStrin
Solution
Yes, you can make that code a lot more readable by noticing a simple fact. When you have the code
The
So you can rewrite it like this:
As you found out, this has the advantage that the subsequent code is shifted by one indentation to the left, making it easier to read. So only with this change, the code now looks like:
You posted an example of your
Said like this, it is possibe to realize that you only need 3 methods:
Note that I changed your code so that it loops directly around the entries instead of looping just with the keys. You should also decide what to do in case of a
The big issue is that there is some duplicated logic in this: each time we are detecting a pattern and, if it's not null, applying it to the value. If you were to add more detection algorithm, this method can become very quickly copy/pasted code and clumsy. We need to refactor that and make it generic.
Using Java 8, you can define that as a list of
The goal of those functions is to convert a
This creates a Stream pipeline over the functions. Each of them is applied to the given
If somehow you want to keep your current approach with two different methods (one for detecting the pattern and one for parsins), then what you want comes down to creating a custom class that will hold:
Simply, it could be:
```
public final class Conversion {
private final UnaryOperator patternDetector;
private final BiFunction toObjectFunction;
public Conversion(UnaryOperator patternDetector, BiFunction toObjectFunction) {
this.patternDetector = pat
String decimalPattern = detectDecimal(valueAsString);
if(decimalPattern != null){
return stringToBigDecimal(valueAsString, decimalPattern);
}else{
// ...
}The
else part is actually unnecessary. If the test is true then we return with a value. This means that you exit early from the method, without going further down.So you can rewrite it like this:
String decimalPattern = detectDecimal(valueAsString);
if(decimalPattern != null){
return stringToBigDecimal(valueAsString, decimalPattern);
}
// ...As you found out, this has the advantage that the subsequent code is shifted by one indentation to the left, making it easier to read. So only with this change, the code now looks like:
public static Object stringToDataType(String valueAsString) throws ParseException {
// detections ordered by probability of occurrence in Buffer_Bank.
String decimalPattern = detectDecimal(valueAsString);
if (decimalPattern != null) {
return stringToBigDecimal(valueAsString, decimalPattern);
}
String integerPattern = detectInteger(valueAsString);
if (integerPattern != null) {
return stringToBigInteger(valueAsString);
}
String datePattern = detectDate(valueAsString);
if (datePattern != null) {
return stringToDate(valueAsString, datePattern);
}
return valueAsString;
}You posted an example of your
detect and stringTo methods. In fact, you don't need two methods. You only need a single one that is specific to the current type you want to test. What you want is:- Try to interpret the value as a decimal and return it;
- If not, try to interpret the value as an integer and return it;
- If not, try to interpret the value as a date and return it.
Said like this, it is possibe to realize that you only need 3 methods:
tryDate, tryInteger and tryDecimal. Each of those will do the necessary to try to interpret the String and return null if they can't. Basically, the code you currently have inside stringTo should be merged into the part where you found the correct pattern in detect. As an example:private static Object tryDate(String dateString) {
for (Map.Entry entry : DATE_FORMAT_REGEXPS.entrySet()) {
if (dateString.toLowerCase().matches(entry.getKey())) {
SimpleDateFormat dateFormat = new SimpleDateFormat(entry.getValue());
try {
return dateFormat.parse(dateString);
} catch (ParseException e) {
// what to do in this case? Possibilites: throw an exception or return null
}
}
}
return null; // value is not a date
}Note that I changed your code so that it loops directly around the entries instead of looping just with the keys. You should also decide what to do in case of a
ParseException. You could rethrow a custom runtime exception, wrapping the original, or returning null to signal that the attempt failed.The big issue is that there is some duplicated logic in this: each time we are detecting a pattern and, if it's not null, applying it to the value. If you were to add more detection algorithm, this method can become very quickly copy/pasted code and clumsy. We need to refactor that and make it generic.
Using Java 8, you can define that as a list of
Function.private static final List> FUNCTIONS =
Arrays.asList(s -> tryDecimal(s), s -> tryInteger(s), s -> tryDate(s));The goal of those functions is to convert a
String into a Object. Here, we define 3 which are the 3 methods mentioned above. Then we can havepublic static Object stringToDataType(String valueAsString) {
return FUNCTIONS.stream()
.map(f -> f.apply(valueAsString))
.filter(Objects::nonNull)
.findFirst()
.orElse(valueAsString);
}This creates a Stream pipeline over the functions. Each of them is applied to the given
String, non-null elements are filtered and only the first one is kept. Effectively, this will select the first non-null element, so it will select the first mapping that matched. If all are null then we return a default value with orElse which is the given value as String.If somehow you want to keep your current approach with two different methods (one for detecting the pattern and one for parsins), then what you want comes down to creating a custom class that will hold:
- A function that returns the pattern detected given the value as String;
- A function that returns the object given the value as String and a pattern.
Simply, it could be:
```
public final class Conversion {
private final UnaryOperator patternDetector;
private final BiFunction toObjectFunction;
public Conversion(UnaryOperator patternDetector, BiFunction toObjectFunction) {
this.patternDetector = pat
Code Snippets
String decimalPattern = detectDecimal(valueAsString);
if(decimalPattern != null){
return stringToBigDecimal(valueAsString, decimalPattern);
}else{
// ...
}String decimalPattern = detectDecimal(valueAsString);
if(decimalPattern != null){
return stringToBigDecimal(valueAsString, decimalPattern);
}
// ...public static Object stringToDataType(String valueAsString) throws ParseException {
// detections ordered by probability of occurrence in Buffer_Bank.
String decimalPattern = detectDecimal(valueAsString);
if (decimalPattern != null) {
return stringToBigDecimal(valueAsString, decimalPattern);
}
String integerPattern = detectInteger(valueAsString);
if (integerPattern != null) {
return stringToBigInteger(valueAsString);
}
String datePattern = detectDate(valueAsString);
if (datePattern != null) {
return stringToDate(valueAsString, datePattern);
}
return valueAsString;
}private static Object tryDate(String dateString) {
for (Map.Entry<String, String> entry : DATE_FORMAT_REGEXPS.entrySet()) {
if (dateString.toLowerCase().matches(entry.getKey())) {
SimpleDateFormat dateFormat = new SimpleDateFormat(entry.getValue());
try {
return dateFormat.parse(dateString);
} catch (ParseException e) {
// what to do in this case? Possibilites: throw an exception or return null
}
}
}
return null; // value is not a date
}private static final List<Function<String, Object>> FUNCTIONS =
Arrays.asList(s -> tryDecimal(s), s -> tryInteger(s), s -> tryDate(s));Context
StackExchange Code Review Q#126979, answer score: 6
Revisions (0)
No revisions yet.