patternMinor
Some simple Perl and Regex
Viewed 0 times
simpleperlsomeandregex
Problem
Reads from a flat file of three columns, delimited by at least one tab character. Filters the three columns based on input; returns list of corresponding emails. Arguments passed like
I realize some people may not be down with the
key=value or key= for blank.#!/bin/perl
use warnings;
use strict;
sub get_list {
my $cols = {'list'=>0, 'email'=>1, 'sub'=>2};
my $list = '[a-z0-9_\- ]+';
my $email = '[a-z0-9@\._\-]+';
my $subls = '[a-z0-9\. ]+';
my $return = $cols->{'email'};
foreach (@ARGV) {
my ($key, $value) = split '=';
$value = q{} if not defined $value;
if ($key eq 'list') { $list = $value; }
elsif ($key eq 'email') { $email = $value; }
elsif ($key eq 'sub') { $subls = $value; }
elsif ($key eq 'return') {
$return = $cols->{$value}
if defined $cols->{$value};
}
else { die "invalid key: $key\n"; }
}
my $file = 'input.txt';
open FH, ' 'tacos' }
grep m/^${list}\s\t\s${email}(\s\t\s${subls})?\s*$/gi,
;
close FH;
return (join ', ', keys %emails);
}
print get_list(@ARGV);
print "\n";
# test input... three columns delimited by at least one tab
# third column may be blank. nothing should contain spaces
# in real life, but i'm allowing it...
# archive email@email.com success
# archive email2@email.com fail
# archive email2@email.com success
# archive email3@email.com
# archive email@email.com fail
# taco email@email.com success
# taco email2@email.com fail
# taco email2@email.com success
# taco meat email3@email.com
# tacor email@email.com fail
I realize some people may not be down with the
$thing = $value if defined $value; arrangement but I went ahead and used it since the logic wasn't complex. I prefer it over using space for brackets on one line conditionals. Also, please note, I'm not trying to validate email addresses with regex and I these are truly the onlSolution
my $cols = {'list'=>0, 'email'=>1, 'sub'=>2};I'd call this
my %index_of = { 'list' => 0, 'email' => 1, 'sub' => 2 };and later, change
$return to my $index = $index_of{'email'};The following is used oddly:
print get_list(@ARGV);
foreach (@ARGV) {You are calling the function with parameters
@ARGV but you never use the parameters in the function. Instead, you use @ARGV again. Better would be to use the function parameters:my @arguments = @_;
foreach my $argument (@arguments) {
my ($key, $value) = split {=}, $argument;I also switched to using a named element instead of letting it default to
$_. Alternately, you could pass nothing into the function and use
@ARGV in the foreach as you did. That would work but isn't a best practice. You may not have realized but the
'=' was a regular expression with single quotes as the delimiters. I prefer to use curly brackets for regular expressions. If you wanted to be really plain about it, you could have said qr{=}. # default patterns that can be overridden by command line arguments
my $list = qr{[a-z0-9_\- ]+};
my $email = qr{[a-z0-9@\._\-]+};
my $subls = qr{[a-z0-9\. ]+};
$value = qr{} if not defined $value;Using
qr{} here makes it more obvious that these are regular expression patterns. open my $fh, '<', $fileI'd prefer a variable here rather than a globally scoped bareword.
my $SEPARATOR = qr{\s*\t\s*}
my %emails = map { (split $SEPARATOR)[$return] => 'tacos' }
grep m{^${list}${SEPARATOR}${email}(${SEPARATOR}${subls})?\s*$}gi,
;By defining
$SEPARATOR, we make the regular expressions clearer and reduce the amount of repeated code. my @entries = ();
while ( my $line = ) {
@columns = m{\A(${list})${SEPARATOR}(${email})(${SEPARATOR}(${subls}))?\s*\z}gi;
if ( scalar @columns >= 2 ) {
push @entries, $columns[$index];
}
}
close $fh;
return (join ', ', @entries);This is clearer about what it is doing than the
map of the grep results. Also, it doesn't parse each line twice. And of course it doesn't define a hash just to get the keys. In general, I would make more use of paragraphing. This is extra new lines to separate blocks of code. This allows you to easily see what pieces of code go together. For example, note how I separate the
return from the file handling code. You have comments explaining what the file format should be, but you don't explain what the command line arguments should be. A user would have to read the code to figure it out.
Your function does two distinct things. First, it reads the command line arguments. Second, it opens a file and processes it based on the command line arguments. It would be better if you could separate that into two functions.
Code Snippets
my $cols = {'list'=>0, 'email'=>1, 'sub'=>2};my %index_of = { 'list' => 0, 'email' => 1, 'sub' => 2 };my $index = $index_of{'email'};print get_list(@ARGV);
foreach (@ARGV) {my @arguments = @_;
foreach my $argument (@arguments) {
my ($key, $value) = split {=}, $argument;Context
StackExchange Code Review Q#71041, answer score: 4
Revisions (0)
No revisions yet.