patternpythonMinor
Python parallelization using Popen
Viewed 0 times
popenusingpythonparallelization
Problem
I frequently run a script similar to the one below to analyze an arbitrary number of files in parallel on a computer with 8 cores.
I use Popen to control each thread, but sometimes run into problems when there is much stdout or stderr, as the buffer fills up. I solve this by frequently reading from the streams. I also print the streams from one of the threads to help me follow the progress of the analysis.
I'm curious on alternative methods to thread using Python, and general comments about the implementation, which, as always, has room for improvement. Thanks!
```
import os, sys
import time
import subprocess
def parallelize(analysis_program_path, filenames, N_CORES):
'''
Function that parallelizes an analysis on a list of files on N_CORES number of cores
'''
running = []
sys.stderr.write('Starting analyses\n')
while filenames or running:
while filenames and len(running) < N_CORES:
# Submit new analysis
filename = filenames.pop(0)
cmd = '%s %s' % (analysis_program_path, filename)
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sys.stderr.write('Analyzing %s\n' % filename)
running.append((cmd, p))
i = 0
while i < len(running):
(cmd, p) = running[i]
returncode = p.poll()
st_out = p.stdout.read()
st_err = p.stderr.read() # Read the buffer! Otherwise it fills up and blocks the script
if i == 0: # Just print one of the processes
I use Popen to control each thread, but sometimes run into problems when there is much stdout or stderr, as the buffer fills up. I solve this by frequently reading from the streams. I also print the streams from one of the threads to help me follow the progress of the analysis.
I'm curious on alternative methods to thread using Python, and general comments about the implementation, which, as always, has room for improvement. Thanks!
```
import os, sys
import time
import subprocess
def parallelize(analysis_program_path, filenames, N_CORES):
'''
Function that parallelizes an analysis on a list of files on N_CORES number of cores
'''
running = []
sys.stderr.write('Starting analyses\n')
while filenames or running:
while filenames and len(running) < N_CORES:
# Submit new analysis
filename = filenames.pop(0)
cmd = '%s %s' % (analysis_program_path, filename)
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sys.stderr.write('Analyzing %s\n' % filename)
running.append((cmd, p))
i = 0
while i < len(running):
(cmd, p) = running[i]
returncode = p.poll()
st_out = p.stdout.read()
st_err = p.stderr.read() # Read the buffer! Otherwise it fills up and blocks the script
if i == 0: # Just print one of the processes
Solution
Python has what you want built into the standard library: see the
So you can implement what you want in one line, perhaps like this:
multiprocessing module, and in particular the map method of the Pool class.So you can implement what you want in one line, perhaps like this:
from multiprocessing import Pool
def parallelize(analysis, filenames, processes):
'''
Call `analysis` for each file in the sequence `filenames`, using
up to `processes` parallel processes. Wait for them all to complete
and then return a list of results.
'''
return Pool(processes).map(analysis, filenames, chunksize = 1)Code Snippets
from multiprocessing import Pool
def parallelize(analysis, filenames, processes):
'''
Call `analysis` for each file in the sequence `filenames`, using
up to `processes` parallel processes. Wait for them all to complete
and then return a list of results.
'''
return Pool(processes).map(analysis, filenames, chunksize = 1)Context
StackExchange Code Review Q#20416, answer score: 5
Revisions (0)
No revisions yet.