patternMinor

Automatic report generation from BibTeX

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

bibtexreportgenerationautomaticfrom

Problem

I wrote the following code for the question I asked here. Are there any catastrophic mistakes? (actually it did not really work with perltex and I am in doubt about having syntax mistake(s))

Also, I read it about lists, hashs (still list vs hash is a huge question mark in my mind) and forearch to improve my script. I am kinda confused about the best way to combine them (actually my attempt define variables with a list is failed for now). I am sure, I will find my way, but it will be nicer quicker jump with some help. You can see how I repeat same lines again and again.

```
#!/usr/bin/perl
#
# Strict and warnings are recommended.
use strict;
use warnings;
no warnings "uninitialized"; # Don't warn about unitialized variables#
use Term::ANSIColor; #Some fun
# use 5.010;

# print colored ['bold blue'], "Hey I am a Perl,\n";
# print "I can find all the informations you need for a specific reference from regular";
# print colored ['bold blue'], " bibtex ";
# print "file\n";

my $in = 0; #logical to check am I in a good line or not
my $refPattern='Wang2013'; #which reference I am trying to read
# my $refPattern='test2013';
# my $refPattern='',"$_[0]",''; #which reference I am trying to read
my $filename = '/home/solak/WORKDIR/ARTICLE/allReferences.bib'; #bibtex file storing informations

foreach (@ARGV) {
$refPattern = $_;
};

# print "My variable $refPattern\n";

#possible to improve with list or hash, find them with forearch etc. (did not work yet)
my $abstract ; #lesson:perl scope has crazy/sensible way
my $author ; #lesson:perl scope has crazy/sensible way
my $doi ; #lesson:perl scope has crazy/sensible way
my $issn ; #lesson:perl scope has crazy/sensible way
m

Solution

There are some nice things about this script: It seems to work, you're using strict, and how you open your files is very good (you even consider encoding!).

Unfortunately, it is extremely difficult to follow. Reasons for this are:

-
This program has experienced “organic growth”. There are unnecessary print commands for debugging everywhere, and commented out code has not been removed. That will be the first thing we'll get rid of.

-
You are not very experienced with Perl. Your logic is encoded in a roundabout way, and you are not aware of techniques for better abstraction and simplification.

-
You seem to love fixed-width lines, without any need. Sometimes it may be useful to vertically align related items, but you take this to absurd lengths.

Let's start with your while () loop. The control flow in this loop is:

for each line:
  set $in if a pattern matches
  break if we encounter an empty line
  if $in is set:
    set $line to current line if $in is set
    parse the $line
    next iteration if $content is empty
    set the correct variable to the $content

The complete second part of the loop body is an if. There is no else. This can be replaced by going to the next iteration if the if-condition is false. Both have the same effect of skipping over the rest of the loop body. Also, the line $line = $_ if $in is unnecessary once we already know that $in == 1.

We can get rid of the $in variable, by splitting the while loop into two loops. The first loop only searches for the marker pattern. The second loop does the parsing.

We can also get rid of the $foo = $content if $field eq "foo" copy&paste-madness. You correctly noted that a hash makes sense here. We do not have to declare the fields of a hash up front. Instead, we declare an empty hash like my %hash, and fill in fields like $hash{$field} = $content.

Your parsing code can be drastically simplified by building a larger regex. Your syntax seems to be $field = "$content",. This can be expressed by a regex with two capture groups:

/\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/x

If that regex matches, the field name is in $1 and the contents in $2.

Putting all of this together, we can rewrite your loop as

my $marker = qr/$rePattern/i;
my %fields;

# find the marker
while () {
  if (/$marker/) {
     parse_line($_, \%fields);
     last;
  }
}

while (my $line = ) {
  last if not $line =~ /\S/;
  parse_line($line, \%fields);
}

sub parse_line {
  my ($line, $fields) = @_;
  return if not $line =~ /\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/x;
  my ($field, $contents) = ($1, $2);
  $fields->{$field} = $contents;
  return;
}

In the above code I did two things that may be new to you:

-
I declared a subroutine called parse_line, which conveniently encapsulates all the line parsing logic. This is good because the first line is parsed separately from the other lines, and I didn't want to repeat all the code. The parse_line subroutine takes two arguments: The line and the fields.

-
While Perl subroutines cannot take a hash as argument, they can take a reference to a hash as argument. This is similar to pointers in C, except that Perl uses the \ backslash operator to get a reference to something. Because we are dealing with a reference and not with a hash, we don't access the field as $hash{$field} but as $hashref->{$field}.

Now since we store all the fields in a hash, we don't have variables like $title any more. This is no problem, since we can substitute that with $fields{title} in the remainder of your code.

The rest of your code is just a template. However, you print each line out separately, and explicitly add newlines. If you use 5.010 or use feature 'say', we unlock the say function that behaves exactly like print, but adds a newline to the end of its arguments. In this case, I'd prefer to underline the template character of this code instead, by defining a mini template language: s/[$]{[}]/$fields{$1}/eg. That substitution replaces any word in braces preceded by a dollar sign by the contents of the field of the same name in %fields.

The template itself can be stored in the __DATA__ section of the script. Everything after the __DATA__ marker is not interpreted as Perl code, but is available via the special file handle DATA.

Implementing that might look like this:

```
# fill additional fields
$fields{refPattern} = $refPattern;

# process template
my $template = join '', ;
$template =~ s/[$]{[}]/$fields{$1}/eg;

# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;

__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
\documentclass[12pt]{standalone}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{tikz}
\usepackage{gnuplot-lua-tikz}
\usepackage[shell]{gnuplottex}
\usepackage[]{tcolorbox}
\usetikzlibrary{backgrounds,calc,positioning}
\th

Code Snippets

for each line:
  set $in if a pattern matches
  break if we encounter an empty line
  if $in is set:
    set $line to current line if $in is set
    parse the $line
    next iteration if $content is empty
    set the correct variable to the $content

/\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/x

my $marker = qr/$rePattern/i;
my %fields;

# find the marker
while (<$bibfile>) {
  if (/$marker/) {
     parse_line($_, \%fields);
     last;
  }
}

while (my $line = <$bibfile>) {
  last if not $line =~ /\S/;
  parse_line($line, \%fields);
}

sub parse_line {
  my ($line, $fields) = @_;
  return if not $line =~ /\A \s* (\w+) \s*[=]\*s "(.+)" [,]? \s*\z/x;
  my ($field, $contents) = ($1, $2);
  $fields->{$field} = $contents;
  return;
}

# fill additional fields
$fields{refPattern} = $refPattern;

# process template
my $template = join '', <DATA>;
$template =~ s/[$][{](\w+)[}]/$fields{$1}/eg;

# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;

__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
\documentclass[12pt]{standalone}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{tikz}
\usepackage{gnuplot-lua-tikz}
\usepackage[shell]{gnuplottex}
\usepackage[]{tcolorbox}
\usetikzlibrary{backgrounds,calc,positioning}
\thispagestyle{empty}

\begin{document}

\begin{tikzpicture}[inner sep=0pt]
    \tikzstyle{styBIBR} = [draw,fill=black!60,minimum height={30pt},rectangle,text width=5.5cm,text centered,font=\bf,text=white,font=\huge];
    \tikzstyle{styYEAR} = [draw,fill=blue!60,minimum height={20pt},rectangle,text width=5.5cm,text centered,font=\bf,text=white];
    \tikzstyle{styTITLE}= [draw,,rectangle,minimum height={60pt},text width=11.5cm,,font=\bf, font=\Large];
    \tikzstyle{styATHR} = [draw,,rectangle,minimum height={30pt},text width=17cm,,font=\large];
    \tikzstyle{styJRNL} = [draw,,minimum height={30pt},rectangle,text width=5.5cm,text centered,text=black,font=\small];
    \tikzstyle{styKWRD} = [draw,rectangle,minimum height={30pt},text width=17cm,,font=\bf, font=\small];
    \tikzstyle{styLCLF} = [draw,rectangle,minimum height={10pt},text width=17cm,text=blue, font=\footnotesize];  %local file
    \tikzstyle{styURL}  = [draw,fill=blue!20,rectangle,minimum height={20pt},text width=11.5cm,,font=\bf, font=\Large];  %local file
    \tikzstyle{styNOTES}= [draw,rectangle,minimum height={380pt},text width=17cm,];
    \tikzstyle{styTODO} = [draw,rectangle,minimum height={110pt},text width=17cm];
    \tikzstyle{styCTD}  = [draw,rectangle,minimum height={90pt},text width=8.5cm];
    \draw [draw,use as bounding box] (0cm,0cm) rectangle (17cm, 25cm);

    %%%%%%%%%%%%%%%
    % top group
    %%%%%%%%%%%%%%%
    \node[styTITLE,left=0pt of current bounding box.north west, ,anchor=north west](nodeTITLE) {${title}};
    \node[styATHR,left=0pt of nodeTITLE.south west, anchor=north west](nodeATHR) {${refPattern}};
    \node[styKWRD,left=0pt of nodeATHR.south, anchor=north](nodeKWRD) {Keywords: ${author}};
    \node[styBIBR,left=0pt of nodeTITLE.north east, anchor=north west] (nodeBIBR) {${refPattern}};
    \node[styJRNL,left=0pt of nodeBIBR.south, anchor=north] (nodeJRNL) {${journal};

    %%%%%%%%%%%%%%%
    % bottom group
    %%%%%%%%%%%%%%%
    %local file link
    \node[styLCLF,left=0pt of current bounding box.south west,anchor=south west](nodeLCLF) {localfile link: ${refPattern};
    \node[styCTD,left=0pt of nodeLCLF.north west,anchor=south west](nodeCTD) {\textbf{Cited:} ${refPattern} };
    \node[styCTD,left=0pt of nodeLCLF.north east,anchor=south east](nodeCTDBY) {\textbf{Cited by:} ${refPattern};
    \node[styTODO,left=0pt of nodeCTD.north west,anchor=south west](nodeCTD) {\t

$ perl bibreport.pl 'some pattern' <~/WORKDIR/ARTICLE/allReferences.bib >bibREF.tex

Context

StackExchange Code Review Q#71100, answer score: 6

Revisions (0)

No revisions yet.