HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Assembling very large files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
largefilesassemblingvery

Problem

I have users uploading files sometimes as large as 100+ GB to a local web server. The upload process works well and chunks come in at 50MB. The problem seems to be after the file is uploaded, when the web server assembles the files and the server (24GB RAM), despite not showing any graphical signs of memory pressure, gets very sluggish. I want to make sure my code isn't causing any unnecessary slow-downs.

Please suggest any more efficient way to do it. If it is already ok, then I'll know to look at other aspects of the process.

# open the temp file to write into
with open(temp_filename, 'wb') as temp_file:
    # loop over the chunks
    for i in range(total_chunks):
        with open(os.path.join(get_chunk_filename(chunk_identifier, i + 1)), 'rb') as chunk_file:
            # write the chunk to the temp file
            temp_file.write(chunk_file.read())

Solution

I suggest you use existing library functions, e.g. shutil.copyfileobj to do the copying. Edit to clarify, as Gareth said: Use shutil.copyfileobj(chunk_file, temp_file) instead of temp_file.write(chunk_file.read()).

Other than that (allocating and reading into Python objects via chunk_file.read()) there's no obvious flaws with the code, but I/O in Python is to be avoided on that scale anyway. I'd even say you could try using a shell script with cat $FILES > $OUTPUT and it could perform better.

Context

StackExchange Code Review Q#108326, answer score: 6

Revisions (0)

No revisions yet.