patternMinor
Configuring fields to be extracted by a data processor
Viewed 0 times
extractedprocessorfieldsdataconfiguring
Problem
I solved this problem using closures with Perl and I wonder if/how I could/should have done it differently?
Background:
User creates config file describing properties of some source data file that would be processed:
Examples on the last (if delimited):
or (fixed width):
My Perl reads the config, and prepares ahead the action to take once I start reading the source data file.
Internally, I'm creating preparing my master hash (per the config), which I'll loop on for each record when reading the source.
Now to handle the said examples, I have this code, if the source file would be delimited:
or fixed width:
Is there a better or another way of doing this, or any other suggestions?
Background:
User creates config file describing properties of some source data file that would be processed:
- Type of file delimited (and the delim) or fixed width
- How many fields to expect
- Which fields are to be pulled out
Examples on the last (if delimited):
Fields: FIRST_NAME 2
Fields: LAST_NAME 3
Fields: ADDRESS 4or (fixed width):
Fields: FIRST_NAME 58,30
Fields: LAST_NAME 88,30
Fields: ADDRESS 118,50My Perl reads the config, and prepares ahead the action to take once I start reading the source data file.
Internally, I'm creating preparing my master hash (per the config), which I'll loop on for each record when reading the source.
Now to handle the said examples, I have this code, if the source file would be delimited:
sub ret_col_sub
{
my ($col_name, $col_pos) = @_
warn qq($col_name col_pos='$col_pos' is not a number) if $col_pos =~ /\D/;
$col_pos--;
# Return closure to caller, which will be called later when we're reading the source data file
return sub
{
my $rec = shift;
return $rec->[$col_pos];
};
}or fixed width:
sub ret_substr_sub
{
my ($col_name, $substr_params) = @_;
my ($offset, $len) = $substr_params =~ /(\d+)\s*,\s*(\d+)/;
# The offset might be ZERO, check for length ... but $len must always be P O S I T I V E !
warn qq($col_name substr_params='$substr_params' is missing the offset and length needed by substr) if not length $offset or not $len;
$offset--;
$len--;
# Return closure to caller, which will be called later when we're reading the source data file
return sub
{
my $rec = shift;
return substr $rec, $offset, $len;
};
}Is there a better or another way of doing this, or any other suggestions?
Solution
This seems like a desperate attempt to find a use for closures, and I believe your problem would be solved much more simply by making
Some observations:
-
It makes the API much less useful when the calling code has to know which form of data it is dealing with anyway.
-
I can see that
-
Checking that a string doesn't match
-
It is correct to rebase the substring offset, but a length is a length and that should be left untouched
-
The postfixed
-
As has been said, your style contravenes common standards, in particular those laid out in
$rec an object, perhaps even a tied object that lets you overload indexingSome observations:
-
It makes the API much less useful when the calling code has to know which form of data it is dealing with anyway.
ret_col_sub returns a closure that expects a simple string for its parameter, while ret_substr_sub returns a closure that requires an array reference. All of that should be hidden from the caller-
I can see that
ret_col_sub won't compile-
Checking that a string doesn't match
\D isn't sufficient to ensure that it is numeric, as an empty string will pass that test-
It is correct to rebase the substring offset, but a length is a length and that should be left untouched
-
The postfixed
++ and -- operators have become popular because of the name of the C++ language, but unless you are using the value returned by tge operator you should be using tge prefixed version. It is wrong to rely on the compiler to optimise out the storage of the unused original value-
As has been said, your style contravenes common standards, in particular those laid out in
perlstyle which are the most universalContext
StackExchange Code Review Q#121581, answer score: 3
Revisions (0)
No revisions yet.