HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Downloading list of images

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
listdownloadingimages

Problem

This code downloads a list of images, each in a different thread.

I have some questions about this code:

-
I don't know how to get information about correcting downloading (sometimes thread just stopped and doesn't do anything: no error, nothing and only part of image has been downloaded). I use the next mechanism for fixing: I wait some time and if thread is alive, I write this image on need_reload file. Is there a better solution?

-
Is it good or bad this using of threads for this task?

-
Are parts of my code not "pythonic"?

```
import os
from threading import Thread
import urllib
import requests

def write_to_failed_image_urls_file(file_name, image_url, failed_image_urls_file):
"""
Check image in file and write it if need
:param file_name: image file name
:param image_url: image URL
:param failed_image_urls_file: name of file with fails
:return: None
"""
with open(failed_image_urls_file, 'a+') as need_reload:
need_reload.seek(0)
lines = need_reload.readlines()
founded = False
for line in lines:
if line.startswith(image_url):
print('File is here')
founded = True
break
if not founded:
need_reload.write(image_url + "," + file_name + '\n')

def write_response_to_file(response, file_name):
with open(file_name, 'wb') as f:
for chunk in response.iter_content(chunk_size=2048):
f.write(chunk)

def load_image_chunk(image_url, file_name, dir_):
"""
Loading image by URL
:param image_url: URL of image
:param file_name: destination file name
:param dir_: destination directory
:return: None
"""
r = requests.get(image_url, stream=True)
if r.status_code == requests.codes.ok:
try:
write_response_to_file(r, file_name)
except OSError as err_:
print(err_.__str__(), 'try redownload...')
file_name = os.path.join(dir_, file_nam

Solution

I think this is pretty good code! I have only a few minor suggestions:

-
Rename image and images to something that makes clear they are URLs, not files, not numpy ndarrays, and not any other format. Maybe image_url and image_url_list? The first time I read the code I didn't know what they were. Your docstrings are good, but IMO, its even more Pythonic to have your code speak for itself (with docstrings present as backup obviously).

-
It took me several times of reading the code to figure out exactly what need_reload_file is. Could you name it to be more descriptive? Maybe failed_image_urls_file or something?

-
Web programmers might know what various request status codes mean, but in case a web noob like me reads your code, a comment explaining the meaning of r.status_code == 200 would be helpful.

-
I don't understand the file_name.split('=')[-1] bit. What constraint are you assuming about the structure of file names?

Context

StackExchange Code Review Q#97821, answer score: 4

Revisions (0)

No revisions yet.