HiveBrain v1.2.0
Get Started
← Back to all entries
patternhtmlMinor

HTML Book Compiler

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
bookhtmlcompiler

Problem

I have written a BASH script to parse and compile HTML code into a single page. The script works as expected, although my code is not completely re-usable and the standard output messages could be more helpful.

Besides my intended improvements, are there any other problems with my code?

#!/bin/bash
#Append each section of the book from each retrieved webpage

function help
{
    cat  $OUTPUT

    
        
        Template
        
    
    
HEADER

    mkdir _pages

    for PAGE in $LIST; do
        echo $PAGE
        wget -q --directory-prefix="_pages/" $PAGE
        #Append the entry
        if [ $? == 0 ]; then
                #Extracts the title from a tag within a retrieved HTML document    
                FILENAME=$(basename $PAGE)
                cat > $OUTPUT
                $(xmllint --html --xpath '/html/head/title/text()' "_pages/$FILENAME")
                Retrieved from $PAGE
                
                $(xmllint --html --xpath "//div[@id='content']/node()" "_pages/$FILENAME")
                
CONTENT
        else
               cat > $OUTPUT
                $(basename $PAGE)
                
                Unavailable
                
SECTION
        fi
    done

    cat > $OUTPUT
    
 
FINAL

}

if [ $# == 2 ]; then
    #Check if the 'xmllint' executable and the files exist
    if [ ! $(which xmllint) ]; then
        cat << EOF
       The xmllint executable is missing from your system.
       Please install the program first, before using this script.
EOF
    fi

    if [ -x $1 ]; then
        echo "A file containing a list of URL's is missing."
    fi
    compile $1 $2
else
    help
fi

Solution

Copy paste your code on shellcheck.net, it will give you some interesting recommendations.

Among other things, interestingly, it points out a parsing error for this:

cat > $OUTPUT


Though it actually works for me as it is, it's better to add a space after "CONTENT" to make it clear, like this:

cat > $OUTPUT


When declaring functions, instead of this:

function help
{


The modern convention is this writing style:

help() {


The here-documents disrupt the logical flow of the script, for example here:

if [ ! $(which xmllint) ]; then
        cat << EOF
       The xmllint executable is missing from your system.
       Please install the program first, before using this script.
EOF
    fi


You can mitigate that disruption by moving the printing logic to a helper function.

Code Snippets

cat << CONTENT>> $OUTPUT
cat << CONTENT >> $OUTPUT
function help
{
if [ ! $(which xmllint) ]; then
        cat << EOF
       The xmllint executable is missing from your system.
       Please install the program first, before using this script.
EOF
    fi

Context

StackExchange Code Review Q#98215, answer score: 6

Revisions (0)

No revisions yet.