patternpythonMinor
Download images from an image board based on tags and blacklists
Viewed 0 times
imageandtagsblacklistsbaseddownloadimagesfromboard
Problem
This code is meant to download images from an image board to create a local backup. Images are grouped by tags selected by the user and then downloaded using multiprocessing. Normal use of this program would be daily or weekly, resulting in a typical download queue of no more than 100. However, if the user opts to download images from all time which match their query, the queue can grow up to 100,000. The way I have it written now uses tuples. This is not a problem in some cases, but it is extremely slow parse and can cause crashes on machines with smaller memory pools. In addition to that major flaw, it is one of my first python programs, so I expect it to be messy and/or hacky in a few other places.
It is a fairly large project, which has only grown larger and messier as I think of new features for it, but I trust the stack exchange community a lot when it comes to issues I have. It might even be a ton larger than necessary for what it tries to accomplish.
Main
```
#!/usr/bin/env python
import logging
import os
import sys
from multiprocessing import freeze_support
from collections import namedtuple
from lib import constants, support, api, downloader
if __name__ == '__main__':
freeze_support()
logging.basicConfig(level = support.get_verbosity(), format = constants.LOGGER_FORMAT,
stream = sys.stderr)
LOG = logging.getLogger('program')
LOG.info('Running program version ' + constants.VERSION + '.')
CONFIG = support.get_config('config.ini')
early_terminate = False
early_terminate |= not downloader.internet_connected()
early_terminate |= support.validate_tags(CONFIG)
if early_terminate:
LOG.info('Error(s) occurred during initialization, see above for more information.')
sys.exit(-1)
GROUP = namedtuple('Group', 'tags directory')
blacklist = []
tag_groups = []
LOG.info('Parsing config for blacklist and settings.')
for section in CONFIG.sections():
if section == 'Settings':
It is a fairly large project, which has only grown larger and messier as I think of new features for it, but I trust the stack exchange community a lot when it comes to issues I have. It might even be a ton larger than necessary for what it tries to accomplish.
Main
```
#!/usr/bin/env python
import logging
import os
import sys
from multiprocessing import freeze_support
from collections import namedtuple
from lib import constants, support, api, downloader
if __name__ == '__main__':
freeze_support()
logging.basicConfig(level = support.get_verbosity(), format = constants.LOGGER_FORMAT,
stream = sys.stderr)
LOG = logging.getLogger('program')
LOG.info('Running program version ' + constants.VERSION + '.')
CONFIG = support.get_config('config.ini')
early_terminate = False
early_terminate |= not downloader.internet_connected()
early_terminate |= support.validate_tags(CONFIG)
if early_terminate:
LOG.info('Error(s) occurred during initialization, see above for more information.')
sys.exit(-1)
GROUP = namedtuple('Group', 'tags directory')
blacklist = []
tag_groups = []
LOG.info('Parsing config for blacklist and settings.')
for section in CONFIG.sections():
if section == 'Settings':
Solution
As Alex says, it looks good, but do use flake8, and consider making a generator.
I get NXDOMAIN for x.json - it wasn't clear what those DNS references were supposed to do.
After
Rather than your three lines, this pair of lines seems more pythonic:
Your heart was in the right place, naming it to offer the Gentle Reader a hint, but it seems clear enough, so an
This is odd, I recommend deleting it:
Here is the tiniest of nits: in support.py, PEP8 asks for a blank line before
I sort of expected main to crack sys.argv with
rather than support's get_verbosity. It left me wondering if other functions would be poking around at argv.
In the predicate validate_tags(),
In api get_alias(), it would be natural to define user_tags with a list comprehension. Similarly for aliased_tags. Better, obtain
You defined LOG as a global, yet you create other LOG locals, e.g.:
Consider defining a helper so you can say:
Consider setting a constant to 36 in this way:
That 2nd line should obsolete the isinstance int test.
I get NXDOMAIN for x.json - it wasn't clear what those DNS references were supposed to do.
After
if __name__ == '__main__': you started defining capitalized GLOBALS and doing usual setup steps, which is fine. But then it looks like that chunk of code kept growing, at which point you should def main to avoid polluting the global namespace with identifiers like section.Rather than your three lines, this pair of lines seems more pythonic:
early_terminate = (not downloader.internet_connected()
or support.validate_tags(CONFIG))Your heart was in the right place, naming it to offer the Gentle Reader a hint, but it seems clear enough, so an
if with that expression would suffice.This is odd, I recommend deleting it:
sys.exit(0)Here is the tiniest of nits: in support.py, PEP8 asks for a blank line before
import constants. As a reader, I appreciate that you grouped your imports, I found it helpful.I sort of expected main to crack sys.argv with
args = parser.parse_args()rather than support's get_verbosity. It left me wondering if other functions would be poking around at argv.
In the predicate validate_tags(),
sections = len(list(config.sections())) seems more natural. Consider renaming it to has_valid_tags(). Consider turning the local list illegals into a global set that is initialized just once.In api get_alias(), it would be natural to define user_tags with a list comprehension. Similarly for aliased_tags. Better, obtain
results, verify it is non-empty, and then define result to be the only element you care about, the first one. That would simplify obtaining user_tag and aliased_tag.You defined LOG as a global, yet you create other LOG locals, e.g.:
LOG = logging.getLogger('internet')Consider defining a helper so you can say:
get_log('internet').info('No internet connection detected.')Consider setting a constant to 36 in this way:
def update_progress(downloaded, total, bar_length=36):
progress = float(downloaded / float(total))That 2nd line should obsolete the isinstance int test.
Code Snippets
early_terminate = (not downloader.internet_connected()
or support.validate_tags(CONFIG))sys.exit(0)args = parser.parse_args()LOG = logging.getLogger('internet')get_log('internet').info('No internet connection detected.')Context
StackExchange Code Review Q#151510, answer score: 2
Revisions (0)
No revisions yet.