HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

just implemented multiprocessing and queue demo, wonder if there are any improvements?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
justareimplementedanymultiprocessingwonderandtherequeueimprovements

Problem

i will give a list of urls, and want to make multi processes to fetch certain url's web content, and quit if all the urls are all fetched.

here is my implementation, and not sure if it's the right way to do the stuff:

#coding=utf8

import multiprocessing
from multiprocessing import JoinableQueue
import urllib2
import logging
import os

logging.basicConfig(level=logging.DEBUG)

URLS = [
        'http://stackoverflow.com/q/2243542/94962',
        'http://docs.python.org/library/logging.html',
        'http://www.python.org/dev/peps/pep-3101/',
        'http://news.ycombinator.com/',
        'http://www.evernote.com/about/learn_more/',
        'http://news.php.net/php.internals/55293',
        ]

POOL_SIZE = multiprocessing.cpu_count()

DEST_DIR = '/tmp/pytest/'

url_q = JoinableQueue()

class Worker(multiprocessing.Process):

    def run(self):
        while True:
            try:
                url = url_q.get()
                logging.info('%(process_name)s processing %(url)s' % {
                    'process_name': multiprocessing.current_process().name,
                    'url':url,
                    })
                web_cnt = urllib2.urlopen(url).read()
                url_filename = url[7:].replace('/', '-').strip('.html') + '.html'
                with open(os.path.join(DEST_DIR, url_filename), 'w') as f:
                    f.write(web_cnt)
                url_q.task_done()
            except Exception:
                logging.exception('error')

workers = []
for i in range(POOL_SIZE):
    worker = Worker()
    worker.name = 'worker%s'%i
    workers.append(worker)
    worker.start()

for url in URLS:
    url_q.put(url)

url_q.join()

print 'workers have done stuff'

for worker in workers:
    worker.terminate()

Solution

When you catch the exception, I suggest logging the complete exception so you can tell what went wrong.

I suggest not using abbreviation like web_cnt. Code is easier to read when you spell things out.

There isn't really a reason to use multiprocessing here since your tasks are IO not CPU bound. I'd probably use the eventlet library for this.

Multiprocessing has a Pool class, about half of your code is duplicating the behavior of that.

Context

StackExchange Code Review Q#4699, answer score: 2

Revisions (0)

No revisions yet.