HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Batch program to xcopy from host PC to remote destination with multi-processing

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
destinationprocessingmultiwithprogrambatchhostremotexcopyfrom

Problem

This is a batch program to xcopy from host PC to remote destination with multi-processing.

Kepler, Python-3.x are my environment.

```
#!/usr/bin/env python
# -- coding: utf-8 --
# based on Carnival http://ask.python.kr/users/6970/carnival/

import os
import os.path
import csv
import re

from multiprocessing import Process, Lock

class Server:
def __init__(self, addr, path):
self.addr = addr
self.path = path

def distribute_file(server_list, pathname):
print("Read from {}".format(pathname))

with open(pathname, 'rb') as inFile:
buffer = inFile.read()

for server in server_list:
remotepath = "//%s/%s/%s" % (server.addr, server.path, pathname)
print ("Write to {}".format(remotepath))
with open(remotepath, 'wb') as outFile:
outFile.write(buffer)

def multi_distribute_file(lo, server, dirpath, filename, path):
lo.acquire()

pathname = os.path.join(dirpath, filename)
print("Read from {}".format(pathname))

with open(pathname, 'rb') as inFile:
buffer = inFile.read()

if(path == ''):
remotepath = "//%s/%s/%s" % (server.addr, server.path, filename)
else:
remotepath = "//%s/%s/%s/%s" % (server.addr, server.path, path, filename)

print ("Write to {}".format(remotepath))
with open(remotepath, 'wb') as outFile:
outFile.write(buffer)

lo.release()

def make_dir(server_list, path):
for server in server_list:
dirpath = "//%s/%s/%s" % (server.addr, server.path, path)
d = os.path.dirname(dirpath)
if not os.path.exists(d):
os.makedirs(d)
print ("Make dir {}".format(d))
else:
print ("dir {} already exists.".format(d))

if not os.path.exists(dirpath):
os.makedirs(dirpath)
print ("Make dir {}".format(dirpath))
else:
print ("dir {} already exists.".format(dirpath))

def dist_files(server_

Solution

Technical issues:

  • You are reading whole files into memory at once. Consider what happens if the directory contains huge files.



  • You use multiprocessing, apparently in an attempt to parallelize things, and use locks to synchronize.



  • First, why do you need a lock anyway? Which data do you want to protect? The only shared data are in the filesystems, which could be changed by other processes at any time anyway.



  • Next, each call to multi_distribute_file gets its own lock instance, that is only used once by this call. So there is no synchronization happening at all.



  • On the other hand, if you would use a single lock for all calls to multi_distribute_file, the processes will execute one after another, as each acquires the lock for its entire runtime. Your parallelizing efforts would be in vain.



  • You are spawning a new process for each file being copied. This can create a huge number of processes very quickly. Consider what this will do to your system performance.



  • If something goes wrong in multi_distribute_file and raises an exception, the lock will never be released. And things will go wrong occassionaly as you are communicating over a network. Use a with statement instead.



  • Don't use string manipulation (format and re) to manipulate paths. Use os.path instead. This is saver and more reliable.



Style issues:

  • There are no docstrings and no comments (apart from the header)



  • Avoid single-letter and other abbreviated names like d, l, m, and lo.



  • mydictionary is a meaningless and misleading name. Meaningless, as it doesn't tell us anything about the purpose of the variable, and misleading as it is actually a list, not a dictionary. A better name might be server_list (like you do elsewhere).

Context

StackExchange Code Review Q#31678, answer score: 2

Revisions (0)

No revisions yet.