patternMinor
Automatic report generation from BibTeX
Viewed 0 times
bibtexreportgenerationautomaticfrom
Problem
I wrote the following code for the question I asked here. Are there any catastrophic mistakes? (actually it did not really work with perltex and I am in doubt about having syntax mistake(s))
Also, I read it about
```
#!/usr/bin/perl
#
# Strict and warnings are recommended.
use strict;
use warnings;
no warnings "uninitialized"; # Don't warn about unitialized variables#
use Term::ANSIColor; #Some fun
# use 5.010;
# print colored ['bold blue'], "Hey I am a Perl,\n";
# print "I can find all the informations you need for a specific reference from regular";
# print colored ['bold blue'], " bibtex ";
# print "file\n";
my $in = 0; #logical to check am I in a good line or not
my $refPattern='Wang2013'; #which reference I am trying to read
# my $refPattern='test2013';
# my $refPattern='',"$_[0]",''; #which reference I am trying to read
my $filename = '/home/solak/WORKDIR/ARTICLE/allReferences.bib'; #bibtex file storing informations
foreach (@ARGV) {
$refPattern = $_;
};
# print "My variable $refPattern\n";
#possible to improve with list or hash, find them with forearch etc. (did not work yet)
my $abstract ; #lesson:perl scope has crazy/sensible way
my $author ; #lesson:perl scope has crazy/sensible way
my $doi ; #lesson:perl scope has crazy/sensible way
my $issn ; #lesson:perl scope has crazy/sensible way
m
Also, I read it about
lists, hashs (still list vs hash is a huge question mark in my mind) and forearch to improve my script. I am kinda confused about the best way to combine them (actually my attempt define variables with a list is failed for now). I am sure, I will find my way, but it will be nicer quicker jump with some help. You can see how I repeat same lines again and again.```
#!/usr/bin/perl
#
# Strict and warnings are recommended.
use strict;
use warnings;
no warnings "uninitialized"; # Don't warn about unitialized variables#
use Term::ANSIColor; #Some fun
# use 5.010;
# print colored ['bold blue'], "Hey I am a Perl,\n";
# print "I can find all the informations you need for a specific reference from regular";
# print colored ['bold blue'], " bibtex ";
# print "file\n";
my $in = 0; #logical to check am I in a good line or not
my $refPattern='Wang2013'; #which reference I am trying to read
# my $refPattern='test2013';
# my $refPattern='',"$_[0]",''; #which reference I am trying to read
my $filename = '/home/solak/WORKDIR/ARTICLE/allReferences.bib'; #bibtex file storing informations
foreach (@ARGV) {
$refPattern = $_;
};
# print "My variable $refPattern\n";
#possible to improve with list or hash, find them with forearch etc. (did not work yet)
my $abstract ; #lesson:perl scope has crazy/sensible way
my $author ; #lesson:perl scope has crazy/sensible way
my $doi ; #lesson:perl scope has crazy/sensible way
my $issn ; #lesson:perl scope has crazy/sensible way
m
Solution
There are some nice things about this script: It seems to work, you're using
Unfortunately, it is extremely difficult to follow. Reasons for this are:
-
This program has experienced “organic growth”. There are unnecessary
-
You are not very experienced with Perl. Your logic is encoded in a roundabout way, and you are not aware of techniques for better abstraction and simplification.
-
You seem to love fixed-width lines, without any need. Sometimes it may be useful to vertically align related items, but you take this to absurd lengths.
Let's start with your
The complete second part of the loop body is an
We can get rid of the
We can also get rid of the
Your parsing code can be drastically simplified by building a larger regex. Your syntax seems to be
If that regex matches, the field name is in
Putting all of this together, we can rewrite your loop as
In the above code I did two things that may be new to you:
-
I declared a subroutine called
-
While Perl subroutines cannot take a hash as argument, they can take a reference to a hash as argument. This is similar to pointers in C, except that Perl uses the
Now since we store all the fields in a hash, we don't have variables like
The rest of your code is just a template. However, you print each line out separately, and explicitly add newlines. If you
The template itself can be stored in the
Implementing that might look like this:
```
# fill additional fields
$fields{refPattern} = $refPattern;
# process template
my $template = join '', ;
$template =~ s/[$]{[}]/$fields{$1}/eg;
# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;
__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
\documentclass[12pt]{standalone}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{tikz}
\usepackage{gnuplot-lua-tikz}
\usepackage[shell]{gnuplottex}
\usepackage[]{tcolorbox}
\usetikzlibrary{backgrounds,calc,positioning}
\th
strict, and how you open your files is very good (you even consider encoding!).Unfortunately, it is extremely difficult to follow. Reasons for this are:
-
This program has experienced “organic growth”. There are unnecessary
print commands for debugging everywhere, and commented out code has not been removed. That will be the first thing we'll get rid of.-
You are not very experienced with Perl. Your logic is encoded in a roundabout way, and you are not aware of techniques for better abstraction and simplification.
-
You seem to love fixed-width lines, without any need. Sometimes it may be useful to vertically align related items, but you take this to absurd lengths.
Let's start with your
while () loop. The control flow in this loop is:for each line:
set $in if a pattern matches
break if we encounter an empty line
if $in is set:
set $line to current line if $in is set
parse the $line
next iteration if $content is empty
set the correct variable to the $contentThe complete second part of the loop body is an
if. There is no else. This can be replaced by going to the next iteration if the if-condition is false. Both have the same effect of skipping over the rest of the loop body. Also, the line $line = $_ if $in is unnecessary once we already know that $in == 1.We can get rid of the
$in variable, by splitting the while loop into two loops. The first loop only searches for the marker pattern. The second loop does the parsing.We can also get rid of the
$foo = $content if $field eq "foo" copy&paste-madness. You correctly noted that a hash makes sense here. We do not have to declare the fields of a hash up front. Instead, we declare an empty hash like my %hash, and fill in fields like $hash{$field} = $content.Your parsing code can be drastically simplified by building a larger regex. Your syntax seems to be
$field = "$content",. This can be expressed by a regex with two capture groups:/\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/xIf that regex matches, the field name is in
$1 and the contents in $2.Putting all of this together, we can rewrite your loop as
my $marker = qr/$rePattern/i;
my %fields;
# find the marker
while () {
if (/$marker/) {
parse_line($_, \%fields);
last;
}
}
while (my $line = ) {
last if not $line =~ /\S/;
parse_line($line, \%fields);
}
sub parse_line {
my ($line, $fields) = @_;
return if not $line =~ /\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/x;
my ($field, $contents) = ($1, $2);
$fields->{$field} = $contents;
return;
}In the above code I did two things that may be new to you:
-
I declared a subroutine called
parse_line, which conveniently encapsulates all the line parsing logic. This is good because the first line is parsed separately from the other lines, and I didn't want to repeat all the code. The parse_line subroutine takes two arguments: The line and the fields. -
While Perl subroutines cannot take a hash as argument, they can take a reference to a hash as argument. This is similar to pointers in C, except that Perl uses the
\ backslash operator to get a reference to something. Because we are dealing with a reference and not with a hash, we don't access the field as $hash{$field} but as $hashref->{$field}.Now since we store all the fields in a hash, we don't have variables like
$title any more. This is no problem, since we can substitute that with $fields{title} in the remainder of your code.The rest of your code is just a template. However, you print each line out separately, and explicitly add newlines. If you
use 5.010 or use feature 'say', we unlock the say function that behaves exactly like print, but adds a newline to the end of its arguments. In this case, I'd prefer to underline the template character of this code instead, by defining a mini template language: s/[$]{[}]/$fields{$1}/eg. That substitution replaces any word in braces preceded by a dollar sign by the contents of the field of the same name in %fields.The template itself can be stored in the
__DATA__ section of the script. Everything after the __DATA__ marker is not interpreted as Perl code, but is available via the special file handle DATA.Implementing that might look like this:
```
# fill additional fields
$fields{refPattern} = $refPattern;
# process template
my $template = join '', ;
$template =~ s/[$]{[}]/$fields{$1}/eg;
# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;
__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
\documentclass[12pt]{standalone}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{tikz}
\usepackage{gnuplot-lua-tikz}
\usepackage[shell]{gnuplottex}
\usepackage[]{tcolorbox}
\usetikzlibrary{backgrounds,calc,positioning}
\th
Code Snippets
for each line:
set $in if a pattern matches
break if we encounter an empty line
if $in is set:
set $line to current line if $in is set
parse the $line
next iteration if $content is empty
set the correct variable to the $content/\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/xmy $marker = qr/$rePattern/i;
my %fields;
# find the marker
while (<$bibfile>) {
if (/$marker/) {
parse_line($_, \%fields);
last;
}
}
while (my $line = <$bibfile>) {
last if not $line =~ /\S/;
parse_line($line, \%fields);
}
sub parse_line {
my ($line, $fields) = @_;
return if not $line =~ /\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/x;
my ($field, $contents) = ($1, $2);
$fields->{$field} = $contents;
return;
}# fill additional fields
$fields{refPattern} = $refPattern;
# process template
my $template = join '', <DATA>;
$template =~ s/[$][{](\w+)[}]/$fields{$1}/eg;
# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;
__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
\documentclass[12pt]{standalone}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{tikz}
\usepackage{gnuplot-lua-tikz}
\usepackage[shell]{gnuplottex}
\usepackage[]{tcolorbox}
\usetikzlibrary{backgrounds,calc,positioning}
\thispagestyle{empty}
\begin{document}
\begin{tikzpicture}[inner sep=0pt]
\tikzstyle{styBIBR} = [draw,fill=black!60,minimum height={30pt},rectangle,text width=5.5cm,text centered,font=\bf,text=white,font=\huge];
\tikzstyle{styYEAR} = [draw,fill=blue!60,minimum height={20pt},rectangle,text width=5.5cm,text centered,font=\bf,text=white];
\tikzstyle{styTITLE}= [draw,,rectangle,minimum height={60pt},text width=11.5cm,,font=\bf, font=\Large];
\tikzstyle{styATHR} = [draw,,rectangle,minimum height={30pt},text width=17cm,,font=\large];
\tikzstyle{styJRNL} = [draw,,minimum height={30pt},rectangle,text width=5.5cm,text centered,text=black,font=\small];
\tikzstyle{styKWRD} = [draw,rectangle,minimum height={30pt},text width=17cm,,font=\bf, font=\small];
\tikzstyle{styLCLF} = [draw,rectangle,minimum height={10pt},text width=17cm,text=blue, font=\footnotesize]; %local file
\tikzstyle{styURL} = [draw,fill=blue!20,rectangle,minimum height={20pt},text width=11.5cm,,font=\bf, font=\Large]; %local file
\tikzstyle{styNOTES}= [draw,rectangle,minimum height={380pt},text width=17cm,];
\tikzstyle{styTODO} = [draw,rectangle,minimum height={110pt},text width=17cm];
\tikzstyle{styCTD} = [draw,rectangle,minimum height={90pt},text width=8.5cm];
\draw [draw,use as bounding box] (0cm,0cm) rectangle (17cm, 25cm);
%%%%%%%%%%%%%%%
% top group
%%%%%%%%%%%%%%%
\node[styTITLE,left=0pt of current bounding box.north west, ,anchor=north west](nodeTITLE) {${title}};
\node[styATHR,left=0pt of nodeTITLE.south west, anchor=north west](nodeATHR) {${refPattern}};
\node[styKWRD,left=0pt of nodeATHR.south, anchor=north](nodeKWRD) {Keywords: ${author}};
\node[styBIBR,left=0pt of nodeTITLE.north east, anchor=north west] (nodeBIBR) {${refPattern}};
\node[styJRNL,left=0pt of nodeBIBR.south, anchor=north] (nodeJRNL) {${journal};
%%%%%%%%%%%%%%%
% bottom group
%%%%%%%%%%%%%%%
%local file link
\node[styLCLF,left=0pt of current bounding box.south west,anchor=south west](nodeLCLF) {localfile link: ${refPattern};
\node[styCTD,left=0pt of nodeLCLF.north west,anchor=south west](nodeCTD) {\textbf{Cited:} ${refPattern} };
\node[styCTD,left=0pt of nodeLCLF.north east,anchor=south east](nodeCTDBY) {\textbf{Cited by:} ${refPattern};
\node[styTODO,left=0pt of nodeCTD.north west,anchor=south west](nodeCTD) {\t$ perl bibreport.pl 'some pattern' <~/WORKDIR/ARTICLE/allReferences.bib >bibREF.texContext
StackExchange Code Review Q#71100, answer score: 6
Revisions (0)
No revisions yet.