patternMinor
Perl script to analyze CSV files
Viewed 0 times
scriptanalyzeperlcsvfiles
Problem
I made a Perl script to analyze CSV files. However, it is running a little slow. Is there anything that I can change to make it more efficient?
```
#!/usr/bin/perl -w
use strict;
use warnings;
use Time::Piece;
use List::Util qw( min max );
use List::MoreUtils qw(first_index);
use Tk;
use CGI;
my $current_file=();
my $mw = new MainWindow;
my $testType;
my $code_font = $mw->fontCreate('code', -weight => 'bold',-size => 12);
my $cb;
my $right_frame;
my $title = $mw -> Label(-text=>"\nLog File Analyzer\n", -font => 'code' ) -> pack();
my $label = $mw -> Label(-text=>"To run this generator the columns of your LogFile should be the following:\nIndex - Time - Summary - ID - DataLength - Data...\nIf this is not the format of the file it will not work.\n") -> pack();
my $browseButton = $mw -> Button(-text => "Browse", -command => \&open_file)-> pack();
$cb = $mw->Checkbutton(-text => 'Generator Test',-variable => \$testType, -onvalue => 1 , -offvalue => 0)->pack();
$cb = $mw->Checkbutton(-text => 'Other', -onvalue => 2, -offvalue => 0)->pack();
my $generateButton = $mw -> Button(-text => "Generate Report", -command => \&generate_report)->pack;
my $quitButton = $mw -> Button(-text => "Quit",-command => sub { exit })-> pack( -side => 'bottom');
sub open_file
{
my @types =
(["CSV files", [qw/.csv /]],
["All files", '*'],
);
$current_file= $mw->getOpenFile(-filetypes => \@types);
print "$current_file\n";
}
sub generate_report
{
sub str2time
{
my ($str) = @_;
$str =~ s/(\.[0-9]+)?\z//;
my $fraction = $1 || 0;
return Time::Piece->strptime($str, '%H:%M:%S')->epoch + $fraction;
}
#Variables
my $idNumber;
my $dataLength;
my $timeIntervals;
my $numberOfUniqueIDs;
my $numberOfMessages = 0;
my $initialTime;
my $finalTime;
my $timeLength;
my $firstNumber;
my $secondNumber;
my $differenceIntervals;
my $min;
my $max;
my $lineOfMin1;
my $lineOfMin2;
my $lineOfMax1;
my $lineOfMax2;
my $largestDLCMessage;
my $lin
```
#!/usr/bin/perl -w
use strict;
use warnings;
use Time::Piece;
use List::Util qw( min max );
use List::MoreUtils qw(first_index);
use Tk;
use CGI;
my $current_file=();
my $mw = new MainWindow;
my $testType;
my $code_font = $mw->fontCreate('code', -weight => 'bold',-size => 12);
my $cb;
my $right_frame;
my $title = $mw -> Label(-text=>"\nLog File Analyzer\n", -font => 'code' ) -> pack();
my $label = $mw -> Label(-text=>"To run this generator the columns of your LogFile should be the following:\nIndex - Time - Summary - ID - DataLength - Data...\nIf this is not the format of the file it will not work.\n") -> pack();
my $browseButton = $mw -> Button(-text => "Browse", -command => \&open_file)-> pack();
$cb = $mw->Checkbutton(-text => 'Generator Test',-variable => \$testType, -onvalue => 1 , -offvalue => 0)->pack();
$cb = $mw->Checkbutton(-text => 'Other', -onvalue => 2, -offvalue => 0)->pack();
my $generateButton = $mw -> Button(-text => "Generate Report", -command => \&generate_report)->pack;
my $quitButton = $mw -> Button(-text => "Quit",-command => sub { exit })-> pack( -side => 'bottom');
sub open_file
{
my @types =
(["CSV files", [qw/.csv /]],
["All files", '*'],
);
$current_file= $mw->getOpenFile(-filetypes => \@types);
print "$current_file\n";
}
sub generate_report
{
sub str2time
{
my ($str) = @_;
$str =~ s/(\.[0-9]+)?\z//;
my $fraction = $1 || 0;
return Time::Piece->strptime($str, '%H:%M:%S')->epoch + $fraction;
}
#Variables
my $idNumber;
my $dataLength;
my $timeIntervals;
my $numberOfUniqueIDs;
my $numberOfMessages = 0;
my $initialTime;
my $finalTime;
my $timeLength;
my $firstNumber;
my $secondNumber;
my $differenceIntervals;
my $min;
my $max;
my $lineOfMin1;
my $lineOfMin2;
my $lineOfMax1;
my $lineOfMax2;
my $largestDLCMessage;
my $lin
Solution
It is very difficult to read and understand the code, and in addition sample data for the log files that the program is supposed to read and process, is not given. This makes it unnecessarily hard to write a useful review. Anyway, the code illustrates programming habits that in general cannot be recommended, and as such is still a good candidate for a review. I will try to focus on the main points so you can get an idea and improve from there.
As I understand, the program is supposed to read a CSV log file and produce a summary of it. It will give the user some options for the type of summary to produce.
Your main question is:
It is running a little slow. Is there anything that I can change to
make it more efficient?
Since the code has so many issues that should be dealt with before any optimization can be considered (elementary programming errors, some logical errors, and also more serious problems in form of bad programming practices) and further realistic input data is missing, I choose not focus on how to improve the speed of the program here. Instead I will focus on programming practices and how to achieve maintainable code.
The current structure of the program can be divided into two parts:
The first parts sets up the GUI, whereas the second part reads the CSV file and generates the report when the user click the button.
It is quite obvious that the
In order to cope with it, a lot of comments have been added to the code.
This is a well-known antipattern. One should fix the code by refactoring it, not by commenting it. For a more in-depth treatment, I can recommend the talk Søren Lund at YAPC EU 2016: "Documenting Code Patterns and Anti-Patterns".
So how can the situation be improved? The solution is to provide more structure by refactoring the large function into smaller pieces. Since the generation of the report seems like a well-defined concept and since its implementation requires more than a simple function of, say 60 lines of code, I suggest it is refactored into a separate module, and therein furthered refactored into several subroutines.
The program in itself including the GUI, also seems like an entity that could be reused. I therefore suggest to make that also into a module, leaving us with a refactorization into three source files:
The main difficulty with this refactorization seems to be that the report generator module needs to modify the GUI (which is set up by the analyzer module) and also needs access to the
The solution I propose here, is to encapsulate the information into an object, i.e. by making
An alternative (and perhaps cleaner solution?) could be to try to make
With all that in place (and all other programming errors fixed, see below) you could start doing real improvements to your code like:
file). When using a GUI, error messages should not be written to the parent process's (
or written to error log files.
More specific comments
I will now go through some issues related to style, logical error, and syntax errors in the code. These issues need to be adressed before major improvements can start.
Coding style
Coding style is important to make your code readable, understandable, and maintainable. The guidelines have been develo
As I understand, the program is supposed to read a CSV log file and produce a summary of it. It will give the user some options for the type of summary to produce.
Your main question is:
It is running a little slow. Is there anything that I can change to
make it more efficient?
Since the code has so many issues that should be dealt with before any optimization can be considered (elementary programming errors, some logical errors, and also more serious problems in form of bad programming practices) and further realistic input data is missing, I choose not focus on how to improve the speed of the program here. Instead I will focus on programming practices and how to achieve maintainable code.
The current structure of the program can be divided into two parts:
- Line 2-36:
usestatements, main script plusopen_file()function.
- Line 37-259 :
generate_report()function
The first parts sets up the GUI, whereas the second part reads the CSV file and generates the report when the user click the button.
It is quite obvious that the
generate_report function has become too long. It has been back-indented as not to disappear off the right end of the screen; it even has an inner sub routine (but not of lexical type!); further all declarations are done at the top of the subroutine, even for variables first used 200 lines below! ..and some of the variables are not used at all. Such a "beast" is obviously very difficult to maintain.In order to cope with it, a lot of comments have been added to the code.
This is a well-known antipattern. One should fix the code by refactoring it, not by commenting it. For a more in-depth treatment, I can recommend the talk Søren Lund at YAPC EU 2016: "Documenting Code Patterns and Anti-Patterns".
So how can the situation be improved? The solution is to provide more structure by refactoring the large function into smaller pieces. Since the generation of the report seems like a well-defined concept and since its implementation requires more than a simple function of, say 60 lines of code, I suggest it is refactored into a separate module, and therein furthered refactored into several subroutines.
The program in itself including the GUI, also seems like an entity that could be reused. I therefore suggest to make that also into a module, leaving us with a refactorization into three source files:
main.plusesMy::CsvAnalyzerto generate a GUI and produce a report.
My/CsvAnalyzer.pmimplements the classMy::CsvAnalyzerwhich creates the GUI through anew()and arun()method. It uses an object of classMy::CsvAnalyzer::ReportGeneratorto generate the report.
My/CsvAnalyzer/ReportGenerator.pmimplements the classMy::CsvAnalyzer::ReportGeneratorwhich generates the report.
The main difficulty with this refactorization seems to be that the report generator module needs to modify the GUI (which is set up by the analyzer module) and also needs access to the
$current_file and $testType variables used by Tk::Button and Tk::Checkbutton. These variables are defined in My/CsvAnalyzer.pm.The solution I propose here, is to encapsulate the information into an object, i.e. by making
My::CsvAnalyzer->new() return an object $analyzer with the two mentioned fields. Now, $analyzer->run() will pass a reference to itself ($self) and the GUI ( $mw ) to My::CsvAnalyzer::ReportGenerator->new(). In this way, the report generator will have access to both the GUI and the variables.An alternative (and perhaps cleaner solution?) could be to try to make
My::CsvAnalyzer::ReportGenerator independent of the GUI, and let the parent My::CsvAnalyzer handle messages and errors that should be displayed in the GUI.With all that in place (and all other programming errors fixed, see below) you could start doing real improvements to your code like:
- Improve the design of the GUI
- How the GUI should report errors (in user input, and in the read CSV
file). When using a GUI, error messages should not be written to the parent process's (
main.pl) STDERR using die ".." or to its STDOUT using print/say. Instead error messages should be displayed in dialogsor written to error log files.
More specific comments
I will now go through some issues related to style, logical error, and syntax errors in the code. These issues need to be adressed before major improvements can start.
Coding style
Coding style is important to make your code readable, understandable, and maintainable. The guidelines have been develo
Code Snippets
if ( $a == 1 ) {
....
}
else {
....
}sub open_file
{sub open_file {my $code_font = $mw->fontCreate('code', -weight => 'bold', -size => 12);$mw -> fontCreate('code', -weight => 'bold', -size => 12);Context
StackExchange Code Review Q#143557, answer score: 7
Revisions (0)
No revisions yet.