patternpythonMinor
Download a file from given URL and retry on connection errors
Viewed 0 times
fileconnectionanddownloaderrorsfromgivenurlretry
Problem
I wrote a script for Python 2.78 which is supposed to download and save a file from a given URL using the requests library.
In case that a connection to the server can be established and a valid response is received, the response (e.g. a HTML file) should be downloaded to hard disk.
If the server gives a bad HTTP response code (anything but 200) or no connection could be established at all, a revealing error message should be printed, giving the specific type of error (e.g. 'connection refused' or 'HTTP 400 response code'. In this case, the download attempt should be retried until a predefined maximum of download tries is exceeded.
The URL and other connection parameters such as connection timeout, read timeout, maximum download attempts etc. as well as the location and filename for downloading the file can be specified on top of the script within the
The script works for me as is. However, I get the impression that it is overly complicated and includes too much nesting of the function which reduces readability. Moreover, the two configuration dictionaries are passed through several functions which I have the impression is hard to follow. Any suggestions on how to improve this code in terms of readability, simplicity or good coding style is appreciated!
```
###########
# IMPORTS #
###########
# file system browsing
import glob, os
# http services / file downloading
import requests
from requests.auth import HTTPBasicAuth
# sleeping
import time
###########
# CONFIGURATION #
###########
# get path from which the script is being run
import os
path = os.path.dirname(os.path.abspath(__file__))
# useful test urls:
# http://httpbin.org/basic-auth/user/passwd connectionParameters['maxAttempts']:
raise MaxAttemptsException('MaxAttemptsException: Failed {0} attempts to download the file. Exiting...'.format(connectionParameters['maxAttem
In case that a connection to the server can be established and a valid response is received, the response (e.g. a HTML file) should be downloaded to hard disk.
If the server gives a bad HTTP response code (anything but 200) or no connection could be established at all, a revealing error message should be printed, giving the specific type of error (e.g. 'connection refused' or 'HTTP 400 response code'. In this case, the download attempt should be retried until a predefined maximum of download tries is exceeded.
The URL and other connection parameters such as connection timeout, read timeout, maximum download attempts etc. as well as the location and filename for downloading the file can be specified on top of the script within the
CONFIGURATION section. There can also be found some example URLs which mock bad server behaviour (based on httpbin).The script works for me as is. However, I get the impression that it is overly complicated and includes too much nesting of the function which reduces readability. Moreover, the two configuration dictionaries are passed through several functions which I have the impression is hard to follow. Any suggestions on how to improve this code in terms of readability, simplicity or good coding style is appreciated!
```
###########
# IMPORTS #
###########
# file system browsing
import glob, os
# http services / file downloading
import requests
from requests.auth import HTTPBasicAuth
# sleeping
import time
###########
# CONFIGURATION #
###########
# get path from which the script is being run
import os
path = os.path.dirname(os.path.abspath(__file__))
# useful test urls:
# http://httpbin.org/basic-auth/user/passwd connectionParameters['maxAttempts']:
raise MaxAttemptsException('MaxAttemptsException: Failed {0} attempts to download the file. Exiting...'.format(connectionParameters['maxAttem
Solution
Python's naming case is
For example, this:
becomes:
And, as another example, this:
becomes this:
However, for classes, the naming case is
It is a good idea to put all your
-
In the next section, you
-
The
Since you are already
You have some fairly long function names:
-
-
-
You seem to have some unnecessary words in these names. I would change these to:
-
-
-
You don't need to be so specific with
When you call the method
You should be passing an exit/error code. The exit code describes how execution went to external programs in a single number.
Any non-zero number means that something went wrong. Therefore, for example, when you are
This seems awfully unnecessary:
Why are you creating a variable with an extremely simplistic value, and then having it returned from the function in the very next line?
Wouldn't it make a lot more sense to just do:
The same sort-of goes to these lines:
Why are you creating a brand new variable to only have it destroyed in the next line?
You should just merge the content of
You do this in a few other spots, along with the above recommendation. You should fix those spots too.
Your function
Oh, hey! Me? Yeah, I check to make sure that the response status from
the server is a good one.
However, what it really is doing is this:
Oh, hey! Me? Yeah, I save the response file to the disk.
This leads me to my next point.
Starting from the first function you call,
What I mean is, the
Why? Well (assuming no errors have occured),
-
-
-
To fix this, I'd say that you have two options:
-
Since these functions do so little, just group them into one function.
-
Move all error catching to one function, but keep the others doing their job.
I personally would choose the first one, because if you were to string the error catching and returns from all these functions, you would literally be left with a single line.
Here is how I would write the beginning of
Then, you do the error catching that you were doing in the other functions:
```
except requests.RequestException as e: #from getResponseFromServer
[error message]
except requests.HTTPError as e: #from checkHTTPresponseStatus
[error message]
except EnvironmentError as e: #from saveFile
[error message]
attempt = r
snake_case (makes sense, right?); you are writing your identifiers and functions is camelCase.For example, this:
connectionParametersbecomes:
connection_parametersAnd, as another example, this:
cleanUpDownloadFolderbecomes this:
cleanup_download_folderHowever, for classes, the naming case is
PascalCase, which you did perfectly well for MaxAttemptsException.It is a good idea to put all your
imports together, as you did under your IMPORTS section. However, there are two problems:-
In the next section, you
import os again.-
The
import os in the next section is not up in the IMPORTS section with the other imports.Since you are already
importing os in the IMPORTS section, you should just completely remove this line from CONFIGURATION:import osYou have some fairly long function names:
-
tryToDownloadXmlFileFromServer-
cleanUpDownloadFolder-
checkHTTPresponseStatusYou seem to have some unnecessary words in these names. I would change these to:
-
download_xml-
clean_downloads-
check_responseYou don't need to be so specific with
tryTo and Folder and Server... since your program is all about those, these topics should just be implied in your function names.When you call the method
sys.exit:exit()You should be passing an exit/error code. The exit code describes how execution went to external programs in a single number.
Any non-zero number means that something went wrong. Therefore, for example, when you are
exiting from the program in retry, you should be passing a non-zero number to show that something went wrong.This seems awfully unnecessary:
success = False
return successWhy are you creating a variable with an extremely simplistic value, and then having it returned from the function in the very next line?
Wouldn't it make a lot more sense to just do:
return FalseThe same sort-of goes to these lines:
success = checkHTTPresponseStatus(response, connectionParameters, downloadParameters)
return successWhy are you creating a brand new variable to only have it destroyed in the next line?
You should just merge the content of
success straight to the return statement, like this:return checkHTTPresponseStatus(response, connectionParameters, downloadParameters)You do this in a few other spots, along with the above recommendation. You should fix those spots too.
Your function
checkHTTPresponseStatus is not doing what it seems like it should be doing. The title tells me,Oh, hey! Me? Yeah, I check to make sure that the response status from
the server is a good one.
However, what it really is doing is this:
Oh, hey! Me? Yeah, I save the response file to the disk.
This leads me to my next point.
Starting from the first function you call,
tryToDownloadXmlFileFromServer, all the other functions that are called in this one are just a chain of results.What I mean is, the
success variable in tryToDownloadXmlFileFromServer is going to be the return value of saveFile unless an error occurred along the way; in that case it'd be false.Why? Well (assuming no errors have occured),
-
checkHTTPresponseStatus returns the return of saveFile-
getResponseFromServer returns the return of checkHTTPresponseStatus-
tryToDownloadXmlFileFromServer uses the return of getResponseFromServer.To fix this, I'd say that you have two options:
-
Since these functions do so little, just group them into one function.
-
Move all error catching to one function, but keep the others doing their job.
I personally would choose the first one, because if you were to string the error catching and returns from all these functions, you would literally be left with a single line.
Here is how I would write the beginning of
tryToDownloadXmlFileFromServer:def download_xml(connection_params, download_params):
attempt = 1
while True
try:
print('Downloading xml file from {0}, attempt {1}/{2}\n'.format(connectionParameters['url'], attempt, connectionParameters['maxAttempts']))
response = requests.get(url=connectionParameters['url'], auth=HTTPBasicAuth(connectionParameters['user'], connectionParameters['password']), timeout=(connectionParameters['connectTimeout'], connectionParameters['readTimeout']), verify=True)
response.raise_for_status()
with open(os.path.join(downloadParameters['folder'], downloadParameters['filename']), 'wb') as outputFile:
outputFile.write(response.content)Then, you do the error catching that you were doing in the other functions:
```
except requests.RequestException as e: #from getResponseFromServer
[error message]
except requests.HTTPError as e: #from checkHTTPresponseStatus
[error message]
except EnvironmentError as e: #from saveFile
[error message]
attempt = r
Code Snippets
connectionParametersconnection_parameterscleanUpDownloadFoldercleanup_download_foldersuccess = False
return successContext
StackExchange Code Review Q#81967, answer score: 7
Revisions (0)
No revisions yet.