patternpythonMinor
Extracting lines from a file, the smelly way
Viewed 0 times
filethesmellywayextractingfromlines
Problem
I have a section of code I use to extract an event log out of a large text file. It works well, it's just my use of
Is there a nicer way of doing this?
Expected output:
list(itertools.takewhile(...)) that feels a little sketchy to me.Is there a nicer way of doing this?
import itertools
testdata = '''
Lots of other lines...
Really quite a few.
*************
* Event Log *
*************
Col1 Col2 Col3
----- ----- -----
1 A B
2 A C
3 B D
Other non-relevant stuff...
'''
def extractEventLog(fh):
fhlines = (x.strip() for x in fh)
list(itertools.takewhile(lambda x: 'Event Log' not in x, fhlines))
list(itertools.takewhile(lambda x: '-----' not in x, fhlines))
lines = itertools.takewhile(len, fhlines) # Event log terminated by blank line
for line in lines:
yield line # In the real code, it's parseEventLogLine(line)Expected output:
>>> list(extractEventLog(testdata.splitlines()))
['1 A B', '2 A C', '3 B D']Solution
Yes, it is indeed a bit sketchy/confusing to use
takewhile when you really don't want to take the lines, but discard them. I think it's better to use dropwhile and then use its return value instead of discarding it. I believe that that captures the intent much more clearly:def extractEventLog(fh):
fhlines = (x.strip() for x in fh)
lines = itertools.dropwhile(lambda x: 'Event Log' not in x, fhlines)
lines = itertools.dropwhile(lambda x: '-----' not in x, lines)
lines.next() # Drop the line with the dashes
lines = itertools.takewhile(len, lines) # Event log terminated by blank line
for line in lines:
yield line # In the real code, it's parseEventLogLine(line)Code Snippets
def extractEventLog(fh):
fhlines = (x.strip() for x in fh)
lines = itertools.dropwhile(lambda x: 'Event Log' not in x, fhlines)
lines = itertools.dropwhile(lambda x: '-----' not in x, lines)
lines.next() # Drop the line with the dashes
lines = itertools.takewhile(len, lines) # Event log terminated by blank line
for line in lines:
yield line # In the real code, it's parseEventLogLine(line)Context
StackExchange Code Review Q#1344, answer score: 6
Revisions (0)
No revisions yet.