HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Filtering file for command line tool

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filelineforcommandtoolfiltering

Problem

I've written a little command-line utility for filtering log files. It's like grep, except instead of operating on lines it operates on log4j-style messages, which may span multiple lines, with the first line always including the logging level (TRACE, DEBUG etc.).

Example usage on a file short.log with contents like this:

16:16:12 DEBUG - Something happened, here's a couple lines of info:
debug line
another debug line
16:16:14 - I'm being very verbose 'cause you've put me on TRACE
trace info
16:16:15 TRACE - single line trace
16:16:16 DEBUG - single line debug


logrep -f short.log DEBUG produces:

16:16:12 DEBUG - Something happened, here's a couple lines of info:
debug line
another debug line
16:16:16 DEBUG - single line debug


I think the main loop of the program could probably be simplified with some sort of parse and filter.

file = fileinput.input(options.file)
try:
line = file.next()
while True:
if any(s in line for s in loglevels):
if filter in line:
sys.stdout.write(line)
line = file.next()
while not any(s in line for s in loglevels):
sys.stdout.write(line)
line = file.next()
continue
line = file.next()
except StopIteration:
return

Solution

I see two problems with your loops.

In terms of style, calling file.next() and catching StopIteration is highly unconventional. The normal way to iterate is:

for line in fileinput.input(options.file):
    …


In terms of functionality, I would personally consider the grepper to be buggy because it will fail to find a log message where the filter keyword that you are seeking appears on a continuation line.

To solve both problems, I would decompose the problem into two parts: reconstructing the logical messages (somewhat ugly) and searching (relatively straightforward).

import fileinput
import re

def log_messages(lines):
    """
    Given an iterator of log lines, generate pairs of
    (level, message), where message is a logical log message.
    possibly multi-line.
    """
    log_level_re = re.compile(r'\b(TRACE|DEBUG|WARN|ERROR|CRITICAL)\b')
    message = None
    for line in lines:
        match = log_level_re.search(line)
        if match:                               # First line
            if message is not None:
                yield level, message
            level, message = match.group(), line
        elif message is not None:               # Continuation line
            message += line
    if message is not None:                     # End of file
        yield level, message

for level, message in log_messages(fileinput.input(options.file)):
    if filter in message:
        sys.stdout.write(message)


Note that I've used a regular expression to look for TRACE, DEBUG, etc. The \b anchors ensure that we don't mistake words like "INTRACELLULAR" for a TRACE message.

Code Snippets

for line in fileinput.input(options.file):
    …
import fileinput
import re

def log_messages(lines):
    """
    Given an iterator of log lines, generate pairs of
    (level, message), where message is a logical log message.
    possibly multi-line.
    """
    log_level_re = re.compile(r'\b(TRACE|DEBUG|WARN|ERROR|CRITICAL)\b')
    message = None
    for line in lines:
        match = log_level_re.search(line)
        if match:                               # First line
            if message is not None:
                yield level, message
            level, message = match.group(), line
        elif message is not None:               # Continuation line
            message += line
    if message is not None:                     # End of file
        yield level, message

for level, message in log_messages(fileinput.input(options.file)):
    if filter in message:
        sys.stdout.write(message)

Context

StackExchange Code Review Q#31768, answer score: 2

Revisions (0)

No revisions yet.