HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Find the maximum line length in a given TSV colum

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
columthemaximumlinelengthfindtsvgiven

Problem

I'm trying to write a function that given a TSV file will get the maximum line length found in a column. I currently have this (working) function. However, I feel that I am missing something as this solution feels poor compared to solutions I have seen to find the maximum length of a file:

max(open(filename), key=len)


Anyway, I would love some feedback and would love some tough feedback.

def get_longest_item_in_tsv_column(filepath, column):
    '''
    Returns the longest line in a given file.
    '''

    longest_item = None

    with open(filepath, 'r') as f:
        csv_reader = csv.reader(f,
                                delimiter='\t')

        for row in csv_reader:

            current_item_length = len(row[column])

            if current_item_length > longest_item:
                longest_item = current_item_length

Solution

This code can be recast into something that will be more efficient if all you are doing is scanning the file for the length of the longest line:

longest_item = None
for row in csv_reader:
    current_item_length = len(row[column])
    if current_item_length > longest_item:
        longest_item = current_item_length


Can be replaced with this:

longest_item = max((len(r[column]) for r in csv_reader))


So, what did we build here? Working from the inside out:

len(r[column])


will return the length of the column in r. While

(len(r[column]) for r in csv_reader)


produces a generator which can return all of the lengths of column returned by csv_reader. And finally:

max((len(r[column]) for r in csv_reader))


returns the largest value of all of the column lengths.

Code Snippets

longest_item = None
for row in csv_reader:
    current_item_length = len(row[column])
    if current_item_length > longest_item:
        longest_item = current_item_length
longest_item = max((len(r[column]) for r in csv_reader))
len(r[column])
(len(r[column]) for r in csv_reader)
max((len(r[column]) for r in csv_reader))

Context

StackExchange Code Review Q#155797, answer score: 3

Revisions (0)

No revisions yet.