patternbashMinor
Counting unique visitors in access log
Viewed 0 times
uniquecountinglogvisitorsaccess
Problem
I'm aware I am probably reinventing the wheel somewhat here, but I am trying to teach myself simple bash coding by completing simple tasks such as parsing files.
To that end I am looking to learn what elephants in the room I could be missing or if there are better ways to use the core functionality of bash without installing any additional tools.
This simple code returns a list of unique IP addresses that have hit the index of my site along with a count of the hits.
Assumptions:
access.log is in the current directory and is in the regular format
Any advice or suggestions for improvement are greatly appreciated
To that end I am looking to learn what elephants in the room I could be missing or if there are better ways to use the core functionality of bash without installing any additional tools.
This simple code returns a list of unique IP addresses that have hit the index of my site along with a count of the hits.
a="access.log"; for b in $(cat $a | awk '{print $1}' | sort | uniq);do echo $b;grep $a -e "GET / HTTP" | grep -c $b;done;Assumptions:
access.log is in the current directory and is in the regular format
Any advice or suggestions for improvement are greatly appreciated
Solution
Well, your code is hardly a bash solution, is it? You use
Additionally, your code is dumped on a single line, and it makes it hard to read. Why not put it in a script, and have separate commands on separate lines.... like:
Those variable names.... ouch.
Then, when I ran the code, I got a lot of funny results.... like:
Why are there
I would consider making it more a study of
EDIT: About the
sort, awk, grep, and echo....Additionally, your code is dumped on a single line, and it makes it hard to read. Why not put it in a script, and have separate commands on separate lines.... like:
#!/bin/bash
a="access.log"
for b in $(cat $a | awk '{print $1}' | sort | uniq); do
echo $b;
grep $a -e "GET / HTTP" | grep -c $b;
done;Those variable names.... ouch.
a and b make it hard to separate from the -c and -e too.... and they mean nothing otherwise. Why not use meaningful names like ip and log?Then, when I ran the code, I got a lot of funny results.... like:
54.69.125.145
1
61.240.144.65
0
64.14.99.254
0
66.196.235.78
0
66.249.64.188
0
74.208.152.232
0Why are there
0 counts.... oh, that's because those are IP's that are not accessing the home page, but are accessing other pages... they appear as $b but don't actually "GET" /.I would consider making it more a study of
bash and use the native bash structures to get things right.... no grep, awk, etc.#!/bin/bash
# use first commandline argument if supplied
log="access.log"
if [ $1 ] ; then
log="$1"
fi
# set a variable to match in a regular expression
match="GET / HTTP"
# create a named array.
declare -A counts
# read the file line-by-line
while IFS='' read -r line || [[ -n "$line" ]]; do
# find lines that access GET / HTTP
if [[ $line =~ $match ]] ; then
# get just the IP of the client
ip=${line%% *}
# get the previous count, default to 0
prev=${counts[$ip]:-0}
# increment the count for this IP
counts[$ip]=$(($prev + 1))
fi
done < "$log"
for ip in "${!counts[@]}" ; do
echo "IP $ip visited ${counts[$ip]} times"
doneEDIT: About the
${line%% } variable substitution. The possibilities when doing variables in bash are remarkably powerful. I recommend looking at the document Parameter Substitution for details, and the man page for bash is good as well (but does not have the examples). The %% token indicates that there should be a pattern search backward from the end of $line for a space ` followed by any characters (the ` - this is a "glob" expression, not a regex). This pattern essentially looks for the first space, and removes it and any charaters after it. The man page document says:${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in filename expansion.
If the pattern matches a trailing portion of the expanded value of
parameter, then the result of the expansion is the value of parameter
with the shortest matching pattern (the ‘%’ case) or the longest matching
pattern (the ‘%%’ case) deleted. If parameter is ‘@’ or ‘*’, the pattern
removal operation is applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with ‘@’ or ‘*’, the pattern removal operation is applied to
each member of the array in turn, and the expansion is the resultant list.Code Snippets
#!/bin/bash
a="access.log"
for b in $(cat $a | awk '{print $1}' | sort | uniq); do
echo $b;
grep $a -e "GET / HTTP" | grep -c $b;
done;54.69.125.145
1
61.240.144.65
0
64.14.99.254
0
66.196.235.78
0
66.249.64.188
0
74.208.152.232
0#!/bin/bash
# use first commandline argument if supplied
log="access.log"
if [ $1 ] ; then
log="$1"
fi
# set a variable to match in a regular expression
match="GET / HTTP"
# create a named array.
declare -A counts
# read the file line-by-line
while IFS='' read -r line || [[ -n "$line" ]]; do
# find lines that access GET / HTTP
if [[ $line =~ $match ]] ; then
# get just the IP of the client
ip=${line%% *}
# get the previous count, default to 0
prev=${counts[$ip]:-0}
# increment the count for this IP
counts[$ip]=$(($prev + 1))
fi
done < "$log"
for ip in "${!counts[@]}" ; do
echo "IP $ip visited ${counts[$ip]} times"
done${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in filename expansion.
If the pattern matches a trailing portion of the expanded value of
parameter, then the result of the expansion is the value of parameter
with the shortest matching pattern (the ‘%’ case) or the longest matching
pattern (the ‘%%’ case) deleted. If parameter is ‘@’ or ‘*’, the pattern
removal operation is applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with ‘@’ or ‘*’, the pattern removal operation is applied to
each member of the array in turn, and the expansion is the resultant list.Context
StackExchange Code Review Q#141270, answer score: 5
Revisions (0)
No revisions yet.