HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Extracting lines from a file, the smelly way

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filethesmellywayextractingfromlines

Problem

I have a section of code I use to extract an event log out of a large text file. It works well, it's just my use of list(itertools.takewhile(...)) that feels a little sketchy to me.

Is there a nicer way of doing this?

import itertools

testdata = '''
Lots of other lines...
Really quite a few.

*************
* Event Log *
*************
Col1  Col2  Col3
----- ----- -----
1     A     B
2     A     C
3     B     D

Other non-relevant stuff...
'''

def extractEventLog(fh):
    fhlines = (x.strip() for x in fh)
    list(itertools.takewhile(lambda x: 'Event Log' not in x, fhlines))
    list(itertools.takewhile(lambda x: '-----' not in x, fhlines))
    lines = itertools.takewhile(len, fhlines) # Event log terminated by blank line
    for line in lines:
        yield line # In the real code, it's parseEventLogLine(line)


Expected output:

>>> list(extractEventLog(testdata.splitlines()))
['1     A     B', '2     A     C', '3     B     D']

Solution

Yes, it is indeed a bit sketchy/confusing to use takewhile when you really don't want to take the lines, but discard them. I think it's better to use dropwhile and then use its return value instead of discarding it. I believe that that captures the intent much more clearly:

def extractEventLog(fh):
    fhlines = (x.strip() for x in fh)
    lines = itertools.dropwhile(lambda x: 'Event Log' not in x, fhlines)
    lines = itertools.dropwhile(lambda x: '-----' not in x, lines)
    lines.next() # Drop the line with the dashes
    lines = itertools.takewhile(len, lines) # Event log terminated by blank line
    for line in lines:
        yield line # In the real code, it's parseEventLogLine(line)

Code Snippets

def extractEventLog(fh):
    fhlines = (x.strip() for x in fh)
    lines = itertools.dropwhile(lambda x: 'Event Log' not in x, fhlines)
    lines = itertools.dropwhile(lambda x: '-----' not in x, lines)
    lines.next() # Drop the line with the dashes
    lines = itertools.takewhile(len, lines) # Event log terminated by blank line
    for line in lines:
        yield line # In the real code, it's parseEventLogLine(line)

Context

StackExchange Code Review Q#1344, answer score: 6

Revisions (0)

No revisions yet.