HiveBrain v1.2.0
Get Started
← Back to all entries
patternbashMinor

Loop through all virtualhost log files and run goaccess on each file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileeachlogallgoaccessloopvirtualhostfilesthroughand

Problem

So I have multiple websites running under apache2 virtualhost - and I wanted to use GoAccess to process the access.log for each site.

The directory structure is like so:

/home/www/site1/html
/home/www/site1/log
/home/www/site1/stats

/home/www/site2/html
/home/www/site2/log
/home/www/site2/stats


Some sites contain two different access.log files -

ssl.access.log for SSL

access.log for non-SSL

These are located in the /log directory of each site

I wanted a cronjob to run every night to process the stats with GoAccess, but I didn't want to write multiple lines of nearly duplicate commands.

I've never written a bash script before, and so I do not know if this is the most efficient way of doing things.

Each report that is generated, needs the month/year in it, so each night it gets overwritten with that months latest stats.

the reports are outputted in the stats directory of each site, in the following format

yyyy-mm.html
sslyyyy-mm.html


The Script

#!/bin/bash

# find all log files which match ess.log
LOG_FILES="/home/www/*/log/*ess.log"

# set the date format
NOW=$(date +"%Y-%m")

# loop through each log file
for f in $LOG_FILES
do

  # drop back from /home/www/site/log to /home/www/site
  path=`dirname $f`
  path=`dirname $path`

  # get the current log filename
  filename=`basename $f`

  # if /home/www/site/stats does not exist - create it
  if [ ! -d "$path/stats" ]; then
    mkdir "$path/stats"
  fi

 # get the first part of the log filename
 prefix=(${filename//./ })

 # if its equal to access, then it's not ssl log, so remove the prefix
 if [ $prefix == 'access' ]; then
   prefix=''
 fi

  # run the goaccess process
  goaccess -f $f --date-format=%d/%b/%Y --log-format='%h %^[%d:%^] "%r" %s %b "%R" "%u"' -a > "$path/stats/$prefix$NOW.html" 

done


I know this is a fairly simple task, but as I have not any experience specifically in this it would be great to know where I could improve this.

Solution

The script has problems with quoting and wildcard expansion. The script could break in unexpected ways if there are spaces or shell metacharacters in the paths. In general, when writing shell scripts, anytime you want to write $variable, you should probably be writing "$variable" instead, expanding the variable in double-quoted context. The only unquoted variable in this script should be $LOG_FILES, because you do want wildcard expansion to occur there.

I think you are manipulating variables too much. For each $f, you are interested in the directory that will contain the output and the filename of the output. I think that the following script would be shorter and more obvious.

LOG_FILES=/home/www/*/log/*access.log
NOW=$(date +%Y-%m)

for f in $LOG_FILES ; do
  path=`dirname "$f"`/../stats
  mkdir -p "$path"                                   # Creates directory as necessary

  case `basename "$f"` in
    ssl.access.log) output="$path/ssl$NOW.html" ; ;;
        access.log) output="$path/$NOW.html"    ; ;;
                 *) exit 1                           # Shouldn't be possible
  esac

  goaccess -f "$f" --date-format=%d/%b/%Y \
           --log-format='%h %^[%d:%^] "%r" %s %b "%R" "%u"' -a > "$output"
done

Code Snippets

LOG_FILES=/home/www/*/log/*access.log
NOW=$(date +%Y-%m)

for f in $LOG_FILES ; do
  path=`dirname "$f"`/../stats
  mkdir -p "$path"                                   # Creates directory as necessary

  case `basename "$f"` in
    ssl.access.log) output="$path/ssl$NOW.html" ; ;;
        access.log) output="$path/$NOW.html"    ; ;;
                 *) exit 1                           # Shouldn't be possible
  esac

  goaccess -f "$f" --date-format=%d/%b/%Y \
           --log-format='%h %^[%d:%^] "%r" %s %b "%R" "%u"' -a > "$output"
done

Context

StackExchange Code Review Q#73444, answer score: 4

Revisions (0)

No revisions yet.