patternpythonMinor
Filtering lines in a log file using multiple regexes
Viewed 0 times
fileregexeslogusingmultiplelinesfiltering
Problem
In python2.7:
2 for loops are always a little inefficient, especially in python. Is there a better way to write the following filter function?
It tags a line from a log file, if it is useful. Otherwise the line will be ignored. Because there are different possible interesting lines, it tries different compiled regexes for each line, until it finds one. Note that no more regexes are checked for a line, after the first one successfully matched.
(The "tagging" is done with the regex object itself, because it can be used later on for retrieving substrings of a line, like filename and row number in an occuring error)
2 for loops are always a little inefficient, especially in python. Is there a better way to write the following filter function?
It tags a line from a log file, if it is useful. Otherwise the line will be ignored. Because there are different possible interesting lines, it tries different compiled regexes for each line, until it finds one. Note that no more regexes are checked for a line, after the first one successfully matched.
def filter_lines(instream, filters):
"""ignore lines that aren't needed
:param instream: an input stream like sys.stdin
:param filters: a list of compiled regexes
:yield: a tupel (line, regex)
"""
for line in instream:
for regex in filters:
if regex.match(line):
yield (line,regex)
break(The "tagging" is done with the regex object itself, because it can be used later on for retrieving substrings of a line, like filename and row number in an occuring error)
Solution
I wouldn’t worry about performance of the loop here. The slow thing isn’t the loop, it’s the matching of the expressions.
That said, I’d express the nested loops via list comprehension instead.
Or alternatively, using higher-order list functions:
That said, I’d express the nested loops via list comprehension instead.
def filter_lines(instream, filters):
return ((line, regex) for regex in filters for line in instream if regex.match(line))Or alternatively, using higher-order list functions:
def filter_lines(instream, filters):
return filter(lambda (line, rx): rx.match(line), itertools.product(instream, filters))Code Snippets
def filter_lines(instream, filters):
return ((line, regex) for regex in filters for line in instream if regex.match(line))def filter_lines(instream, filters):
return filter(lambda (line, rx): rx.match(line), itertools.product(instream, filters))Context
StackExchange Code Review Q#14368, answer score: 3
Revisions (0)
No revisions yet.