HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Multithreaded file downloader using threading and signals

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filesignalsdownloaderthreadingusingandmultithreaded

Problem

This is my first attempt to write a multithreaded application that downloads files from internet. I am looking for improvement in code, logic and better strategy for implementation.

Please ignore the pylint messages which is not the focus at the moment.

To manually test you can use the URLs from the list I have provided.

```
""" Utility to download files over the internet
multithreaded, and uses signal to handle quitting
using KeyBoardInterrupt, prints the status in shell.
"""

import argparse
import logging
import Queue
import urllib2
import os
import signal
import sys
import time
import threading
from socket import error as _SocketError

urls = ["http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3",
"http://s1.fans.ge/mp3/201109/08/John_Legend_So_High_Remix(fans_ge).mp3",
"http://megaboon.com/common/preview/track/786203.mp3"]

DESKTOP_PATH = os.path.expanduser("~/Desktop")

queue = Queue.Queue()
STOP_REQUEST = threading.Event()

def unGracefullExit(sigNum, stack):
# restore the original signal handler as otherwise evil things will happen
# when CTRL+C is pressed, and our signal handler is not re-entrant
signal.signal(signal.SIGINT, original_sigint)
# ensure no thread runs after process is killed
STOP_REQUEST.set()
# handle unix signal recived and exit
sys.stderr.write("Received signal %d " % (sigNum))
os.kill(os.getpid(), signal.SIGINT)
# restore the exit gracefully handler here
signal.signal(signal.SIGINT, unGracefullExit)

class ThreadedFetch(threading.Thread):
""" docstring for ThreadedFetch
"""
def __init__(self, queue, logging):
super(ThreadedFetch, self).__init__()
self.queue = queue
self._log = logging
self.kill_received = False

def run(self):

while not STOP_REQUEST.isSet():
# grabs url of link and path to saveTo and save to lst
host = self.queue.get()
start = FileDownloader(self._log, self, self.qu

Solution

I gave this a quick code review. Please ignore anything you disagree with. I was trying to identify non-idiomatic things that would help improve readability, logic, etc. Hopefully some of this is helpful to you.

I would sort the modules below. I usually add spacing between my import and from statements, but that's not required. Sorting them makes them easier to find.

import argparse
import logging
import os
import Queue
import signal
import sys
import time
import threading
import urllib2

from socket import error  as _SocketError


Considering you might have more URLs in the future, you might want to toss this into a config file:

urls = ["http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3",
"http://s1.fans.ge/mp3/201109/08/John_Legend_So_High_Remix(fans_ge).mp3",
"http://megaboon.com/common/preview/track/786203.mp3"]


Python idioms dictate using snake case:

def unGracefullExit(sigNum, stack):


should be:

ungraceful_exit(sig_num, stack)


Guessing this is a placeholder, if not, this could use a better docstring:

""" docstring for ThreadedFetch
    """


I would put this onto a separate line:

except OSError: pass


I would also describe the public attributes:

""" FileDownloader class that downloads a file.
    """

    """ Class that provides methods for retrieving a file.

        attributes:
          logging (str): Description of logging.
          queue (str): Description of queue.
          

    """


This is a method, not a class:

def downloadFile(self, url, saveTo=None):
        """ this class performs file download job

            Args:
                url(str): url of the file to download
                saveTo(str): path where you want file
                            to be saved on disk
        """


This logic is a little confusing. I would rewrite and add spacing some spacing. Also, I would change your vars to snake_case vs CamelCase. See the comments inline.

if not saveTo:
            saveTo = DESKTOP_PATH ** change the saveTo var to snake case to make it more pythonic save_to.


Add a space here:

try:
            u = urllib2.urlopen(url)
        except urllib2.URLError , er:


(above) change this to something like urlerror" or just e, I can't help but think of emergency room when I see this. However, an error may be an emergency.

except _SocketError , err:


(above) better then er, However, er and err may be a confusing scheme. If you plan to generalize, I would just use the letter e and keep in consistent.

If you plan to distinguish exceptions, I would use a more descriptive name.

This is an old school print statement, also there is a misspelling:

print >>sys.stderr, "Interrupt recived"


Consider

# add the following to you imports:
           from __future__ import print_function

           # then change your statements to:
           print("fatal error", file=sys.stderr)
         # this also preps you for python 3.


Another alternative may be for you to use the logging module.

It's better to use snake case. Also, it looks like you are mixing two difference schemes. CamelCase, and snake_case. Better to keep a consistent scheme CamelCase for Classes, and snake_case for methods, and vars. ALL CAPS for your constants, some use lowercase for constants as well.

def _fileWriteToDisk(self, saveTo, urlObject, file_name):
    ** This is a better doc string. I would rephrase as: Method that writes downloaded file data to disk.

        """ This method performs the task to write the file downloaded to disk
            Args:
                saveTo(str): location where you want file to be saved on disk
                urlObject(urllib2.urlopen): url object to read the buffer
                file_name(str): name of file to be saved.
        """


More camel and snake mixing:

path = os.path.join(saveTo, file_name)


Use a context manager here:

try:
            f = open(path, 'wb')
        except IOError , er:
            self.queue.task_done()
            self._log.error(er)
            return


Rewrite:

with open(path, ‘wb’) as f:
           some_file = f.read() # please sub with better var names.


I'd prefer a join statement here:

return "%s %s" % (str(amount), suffix)
    ** rewrite
    return ' '.join([str(amount), suffix])


Consider using a generator if you are using Python2:

for i in range(len(urls)):


New:

for i in xrange(len(urls)):


If you are going to have a lot of params, consider using *args:

def function_logger(appName, logFile, fileDebugLevel, file_level, console_level=None):


Change this:

if console_level != None:


New

if console_level:

 if __name__ == "__main__":


See if you can move most of the logic below except for main(myapp) into your main function:

```
# change the name of MainThread.
threading.currentThread().setName("FileDownloader")
myapp = threadi

Code Snippets

import argparse
import logging
import os
import Queue
import signal
import sys
import time
import threading
import urllib2

from socket import error  as _SocketError
urls = ["http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3",
"http://s1.fans.ge/mp3/201109/08/John_Legend_So_High_Remix(fans_ge).mp3",
"http://megaboon.com/common/preview/track/786203.mp3"]
def unGracefullExit(sigNum, stack):
ungraceful_exit(sig_num, stack)
""" docstring for ThreadedFetch
    """

Context

StackExchange Code Review Q#54935, answer score: 2

Revisions (0)

No revisions yet.