HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

n largest files in a directory

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
largestdirectoryfiles

Problem

This is a script I wrote to find the n biggest files in a given directory (recursively):

import heapq
import os, os.path
import sys
import operator

def file_sizes(directory):
    for path, _, filenames in os.walk(directory):
        for name in filenames:
            full_path = os.path.join(path, name)
            yield full_path, os.path.getsize(full_path)

num_files, directory = sys.argv[1:]
num_files = int(num_files)

big_files = heapq.nlargest(
        num_files, file_sizes(directory), key=operator.itemgetter(1))
print(*("{}\t{:>}".format(*b) for b in big_files))


It can be run as, eg: bigfiles.py 5 ~.

Ignoring the complete lack of error handling, is there any obvious way to make this clearer, or at least more succinct? I am thinking about, eg, using namedtuples in file_sizes, but is there also any way to implement file_sizes in terms of a generator expression? (I'm thinking probably not without having two calls to os.path, but I'd love to be proven wrong :-)

Solution

You could replace your function with:

file_names = (os.path.join(path, name) for path, _, filenames in os.walk(directory)
        for name in filenames)

file_sizes = ((name, os.path.getsize(name)) for name in file_names)


However, I'm not sure that really helps the clarity.

I found doing this:

big_files = heapq.nlargest(
        num_files, file_names, key=os.path.getsize)
print(*("{}\t{:>}".format(b, os.path.getsize(b)) for b in big_files))


Actually runs slightly quicker then your version.

Code Snippets

file_names = (os.path.join(path, name) for path, _, filenames in os.walk(directory)
        for name in filenames)

file_sizes = ((name, os.path.getsize(name)) for name in file_names)
big_files = heapq.nlargest(
        num_files, file_names, key=os.path.getsize)
print(*("{}\t{:>}".format(b, os.path.getsize(b)) for b in big_files))

Context

StackExchange Code Review Q#8958, answer score: 4

Revisions (0)

No revisions yet.