patternpythonMinor
Downloading list of images
Viewed 0 times
listdownloadingimages
Problem
This code downloads a list of images, each in a different thread.
I have some questions about this code:
-
I don't know how to get information about correcting downloading (sometimes thread just stopped and doesn't do anything: no error, nothing and only part of image has been downloaded). I use the next mechanism for fixing: I wait some time and if thread is alive, I write this image on need_reload file. Is there a better solution?
-
Is it good or bad this using of threads for this task?
-
Are parts of my code not "pythonic"?
```
import os
from threading import Thread
import urllib
import requests
def write_to_failed_image_urls_file(file_name, image_url, failed_image_urls_file):
"""
Check image in file and write it if need
:param file_name: image file name
:param image_url: image URL
:param failed_image_urls_file: name of file with fails
:return: None
"""
with open(failed_image_urls_file, 'a+') as need_reload:
need_reload.seek(0)
lines = need_reload.readlines()
founded = False
for line in lines:
if line.startswith(image_url):
print('File is here')
founded = True
break
if not founded:
need_reload.write(image_url + "," + file_name + '\n')
def write_response_to_file(response, file_name):
with open(file_name, 'wb') as f:
for chunk in response.iter_content(chunk_size=2048):
f.write(chunk)
def load_image_chunk(image_url, file_name, dir_):
"""
Loading image by URL
:param image_url: URL of image
:param file_name: destination file name
:param dir_: destination directory
:return: None
"""
r = requests.get(image_url, stream=True)
if r.status_code == requests.codes.ok:
try:
write_response_to_file(r, file_name)
except OSError as err_:
print(err_.__str__(), 'try redownload...')
file_name = os.path.join(dir_, file_nam
I have some questions about this code:
-
I don't know how to get information about correcting downloading (sometimes thread just stopped and doesn't do anything: no error, nothing and only part of image has been downloaded). I use the next mechanism for fixing: I wait some time and if thread is alive, I write this image on need_reload file. Is there a better solution?
-
Is it good or bad this using of threads for this task?
-
Are parts of my code not "pythonic"?
```
import os
from threading import Thread
import urllib
import requests
def write_to_failed_image_urls_file(file_name, image_url, failed_image_urls_file):
"""
Check image in file and write it if need
:param file_name: image file name
:param image_url: image URL
:param failed_image_urls_file: name of file with fails
:return: None
"""
with open(failed_image_urls_file, 'a+') as need_reload:
need_reload.seek(0)
lines = need_reload.readlines()
founded = False
for line in lines:
if line.startswith(image_url):
print('File is here')
founded = True
break
if not founded:
need_reload.write(image_url + "," + file_name + '\n')
def write_response_to_file(response, file_name):
with open(file_name, 'wb') as f:
for chunk in response.iter_content(chunk_size=2048):
f.write(chunk)
def load_image_chunk(image_url, file_name, dir_):
"""
Loading image by URL
:param image_url: URL of image
:param file_name: destination file name
:param dir_: destination directory
:return: None
"""
r = requests.get(image_url, stream=True)
if r.status_code == requests.codes.ok:
try:
write_response_to_file(r, file_name)
except OSError as err_:
print(err_.__str__(), 'try redownload...')
file_name = os.path.join(dir_, file_nam
Solution
I think this is pretty good code! I have only a few minor suggestions:
-
Rename
-
It took me several times of reading the code to figure out exactly what
-
Web programmers might know what various request status codes mean, but in case a web noob like me reads your code, a comment explaining the meaning of
-
I don't understand the
-
Rename
image and images to something that makes clear they are URLs, not files, not numpy ndarrays, and not any other format. Maybe image_url and image_url_list? The first time I read the code I didn't know what they were. Your docstrings are good, but IMO, its even more Pythonic to have your code speak for itself (with docstrings present as backup obviously).-
It took me several times of reading the code to figure out exactly what
need_reload_file is. Could you name it to be more descriptive? Maybe failed_image_urls_file or something?-
Web programmers might know what various request status codes mean, but in case a web noob like me reads your code, a comment explaining the meaning of
r.status_code == 200 would be helpful.-
I don't understand the
file_name.split('=')[-1] bit. What constraint are you assuming about the structure of file names?Context
StackExchange Code Review Q#97821, answer score: 4
Revisions (0)
No revisions yet.