patternpythonMinor
File copy in Python for slow networks
Viewed 0 times
filenetworksslowforpythoncopy
Problem
Please note: I have a follow up question with updated code, here.
I have written a file-copy routine, as I experienced unreliable results using standard Python when copying file over very slow networks.
The requirements are as follows:
So this is what I have, and it tested to be working OK.
I am looking for tips/tricks to improve this code
PS, I know I haven't done "docstrings" yet... (on my to do list)
Usage:
ccopy.py:
```
import os
import uuid
import hashlib
import platform
def normalizefilepath(path):
#inserts '\\?\' at the start of a path to get rid of 256 character limit of Windows
if path[1:3] == ':\\':
return u'\\\\?\\' + os.path.normcase(path)
return os.path.normcase(path)
def hashfile(filepath, hashtype='md4', buffersize=1024):
ha = hashlib.new(hashtype)
f = open(filepath, 'rb')
while True:
chunk = f.read(buffersize)
if not chunk:
break
ha.update(chunk)
f.close()
return ha.digest()
def filecopy(source, target, hashcheck = False, hashtype = 'md4', buffersize = 1024, overwrite = True, consoleoutput = False):
if not platform.system() == 'Windows': raise Exception('Incorre
I have written a file-copy routine, as I experienced unreliable results using standard Python when copying file over very slow networks.
The requirements are as follows:
- to be used on Windows only
- not to be limited by the 256 character limit of Windows
- configurable when to use hash checksum checking (source / target)
- configurable "buffer size" or "chunk size" (it seems that smaller chunks are more reliable on unreliable/slow networks)
- file only available on target file system AFTER all content has copied. If you copy files, the target filename already exists on the target filesystem, hence a remote application may "see" this file and try to open it while copy still in progress. This is implemented with a temp-target-filename in the code below.
- configurable console output
So this is what I have, and it tested to be working OK.
I am looking for tips/tricks to improve this code
PS, I know I haven't done "docstrings" yet... (on my to do list)
Usage:
import ccopy
source = r'c:\test1\testfile.ext'
target = r'\\someserver\test2\testfile.ext'
ccopy.filecopy(source, target, True, 'md4', 1024, True, True)ccopy.py:
```
import os
import uuid
import hashlib
import platform
def normalizefilepath(path):
#inserts '\\?\' at the start of a path to get rid of 256 character limit of Windows
if path[1:3] == ':\\':
return u'\\\\?\\' + os.path.normcase(path)
return os.path.normcase(path)
def hashfile(filepath, hashtype='md4', buffersize=1024):
ha = hashlib.new(hashtype)
f = open(filepath, 'rb')
while True:
chunk = f.read(buffersize)
if not chunk:
break
ha.update(chunk)
f.close()
return ha.digest()
def filecopy(source, target, hashcheck = False, hashtype = 'md4', buffersize = 1024, overwrite = True, consoleoutput = False):
if not platform.system() == 'Windows': raise Exception('Incorre
Solution
I don't think there is much more to do/check in terms of functionality, but I'd like to point out a few coding style issues (in my opinion).
-
Be careful when checking paths. If you are taking any kind of user input, keep in mind that
-
I'd move all the checks you have at the beginning of
-
You keep using
Maybe something like this:
So the rest of the code could go from this:
to this:
-
Be careful when checking paths. If you are taking any kind of user input, keep in mind that
.. is a valid path. You don't want people to use your module to copy files that they're not supposed to. If you're in a situation where you have to handle user-supplied paths, validate them AND use a whitelist to determine if they should be allowed to read the file. Have a look at abspath and/or realpath.-
I'd move all the checks you have at the beginning of
filecopy in a separate function. You may need those same checks at a later time, you never know.-
You keep using
if consoleoutput. This suggest that a function could be handy.Maybe something like this:
def debug(messages):
if (consoleoutput):
for message in messages:
print(message + '\n');So the rest of the code could go from this:
if consoleoutput: print('Source File and Temp Target File Hashlib/' + hashtype + ' match')
if consoleoutput: print()
if consoleoutput: print('Renaming File')
if consoleoutput: print('Temp Target File: ' + tempfilepath)
if consoleoutput: print('Target File : ' + targetfilepath)to this:
debug(['Source File and Temp Target File Hashlib/' + hashtype + ' match',
'',
'Renaming File',
'Temp Target File: ' + tempfilepath,
'Target File : ' + targetfilepath])Code Snippets
def debug(messages):
if (consoleoutput):
for message in messages:
print(message + '\n');if consoleoutput: print('Source File and Temp Target File Hashlib/' + hashtype + ' match')
if consoleoutput: print()
if consoleoutput: print('Renaming File')
if consoleoutput: print('Temp Target File: ' + tempfilepath)
if consoleoutput: print('Target File : ' + targetfilepath)debug(['Source File and Temp Target File Hashlib/' + hashtype + ' match',
'',
'Renaming File',
'Temp Target File: ' + tempfilepath,
'Target File : ' + targetfilepath])Context
StackExchange Code Review Q#147290, answer score: 5
Revisions (0)
No revisions yet.