HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Creating two output files from two formatted files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
creatingoutputformattedtwofilesfrom

Problem

I've been working on a project to take data from csv files, re-format it, and display it as charts. I'm working for a company over the summer and when I leave, I would like my code to be readable/editable by whoever takes over in the future.

This code reads in two files, formats them, and creates two output files. It takes in one command line argument as a filepath to the source folder for the data. The formatting just counts the number of occurrences of a word and increments a corresponding value in a 2D array.

Would you understand what this is doing if I hadn't given an explanation? Should I change anything about it to make it more readable?

```
#!/usr/bin/perl

use strict;
use warnings;

my $src = $ARGV[0];
my %files = (
'LM' => [$src . '\raw\casesByCategory_LM.csv', $src . '\casesByCategoryLM.csv'],
'TM' => [$src . '\raw\casesByCategory_TM.csv', $src . '\casesByCategoryTM.csv'],
);

foreach my $file (keys %files) {
my @categories = (
['RMA Request', 0],
['Product Info Request', 0],
['Dealer Locator (New or Resupply)', 0],
['Order Request', 0],
['Order Status', 0],
['Wireless Activation/Deactivation', 0],
['Pricing', 0],
['Account Setup/Registration', 0],
['Other', 0],
);
my $total = 0;
open (DATA, $files{$file}[0]) or die $!;

# Remove header
;
my $match = 0;
while (my $line = ) {
# Remove \n character
chomp $line;
# Search through list of categories to find a match
if ($line =~ /^"(.+)"$/) {
foreach my $cat (@categories) {
if ($1 eq $cat->[0]) {
$cat->[1]++;
$match = 1;
}
}
}
# Add case to 'Other' if no match was found
if (not $match) {
$categories[8]->[1]++;
} else {
$match = 0;

Solution

All in all I can understand it OK, it reads fairly well. There are a few structures I would recommend you change.... for a start, I would separate the category definitions from the counters, as it would be easier to maintain:

Note, I am defining @categories as a simple array, not a nested array here.... so it is a different thing from yours:

my @categories = ('RMA Request', 'Product Info Request', 'Dealer Locator (New or Resupply)',
                  'Order Request', 'Order Status', 'Wireless Activation/Deactivation',
                  'Pricing', 'Account Setup/Registration', 'Other');

my %catcounts;
foreach (@categories) {
    $catcounts{$_} = 0;
}


OK, this does three things, it makes adding/changing the categories easy, and it separates the count setup from the category setup. It also sets up a hash fro the counter, so you don't need to loop through them all but can do it by key, so the code:

if ($line =~ /^"(.+)"$/) {
            foreach my $cat (@categories) {
                if ($1 eq $cat->[0]) {
                    $cat->[1]++;
                    $match = 1;
                }
            }
        }
        # Add case to 'Other' if no match was found
        if (not $match) {
            $categories[8]->[1]++;
        } else {
            $match = 0;
        }


can become:

if ($line =~ /^"(.+)"$/ && defined $catcount{$1}) {
            $catcount{$1}++;
        } else {
            $catcount{'Other'}++;
        }


This will be much faster because it uses the hash lookup instead of the array loop to find the category.

you will need to adapt the output to loop through the @categories, but print out the %catcount

Code Snippets

my @categories = ('RMA Request', 'Product Info Request', 'Dealer Locator (New or Resupply)',
                  'Order Request', 'Order Status', 'Wireless Activation/Deactivation',
                  'Pricing', 'Account Setup/Registration', 'Other');

my %catcounts;
foreach (@categories) {
    $catcounts{$_} = 0;
}
if ($line =~ /^"(.+)"$/) {
            foreach my $cat (@categories) {
                if ($1 eq $cat->[0]) {
                    $cat->[1]++;
                    $match = 1;
                }
            }
        }
        # Add case to 'Other' if no match was found
        if (not $match) {
            $categories[8]->[1]++;
        } else {
            $match = 0;
        }
if ($line =~ /^"(.+)"$/ && defined $catcount{$1}) {
            $catcount{$1}++;
        } else {
            $catcount{'Other'}++;
        }

Context

StackExchange Code Review Q#58774, answer score: 3

Revisions (0)

No revisions yet.