patternpythonMinor
Download image links posted to reddit.com
Viewed 0 times
imagelinksdownloadcompostedreddit
Problem
This is a Python script to save imgur pictures posted to reddit.com forums. I'm looking for an assessment on the design of this script and any web security issues that might exist.
Obvious shortcomings: it only downloads image links http://i.imgur.com/xx.png not images from imgur pages http://imgur.com/page/ and lacks the ability to get image albums.
```
# purpose: downloads images from imgur.com links posted to specified reddits
# platform: python 3.2
# references:
# - reddit JSON -- https://github.com/reddit/reddit/wiki/API:-info.json
import json
import urllib.request
import os, sys
def downloadPostsFrom(subreddit):
"Returns reddit posts as JSON"
try:
f = urllib.request.urlopen("http://www.reddit.com/r/"+subreddit+"/.json");
return json.loads(f.read().decode("utf-8"))["data"]["children"]
except Exception:
print("ERROR: malformed JSON response from reddit.com")
raise ValueError
def parseImageURL_From(posts):
"Returns tuple iterator (url, title)"
for node in posts:
post = node['data']
# only accept links from imgur.com
if post['domain'].endswith('imgur.com'):
yield post["url"], post["title"]
def makeFileExt(content_type):
"Return extension for specified content type"
return {
'image/bmp':'bmp',
'image/gif':'gif',
'image/jpeg':'jpg',
'image/png':'png',
'image/tiff':'tif',
'image/x-icon':'ico'
}.get(content_type, 'txt')
def makeFileName(title):
"Compute valid file name from image title. Return file name or default.ext"
title = title.strip(' .?:')
file_name = title.translate(''.maketrans('\/:*?"<>|$@ .','__.-_____S0___'))
file_name = file_name.strip('_').replace('__','_').lower()
return file_name + '.' if file_name else 'default.'
def makeSaveDir(save_dir):
"Creates directory. Returns a valid directory name."
if not os.path.exists(save_dir):
os.makedirs(save_dir)
r
Obvious shortcomings: it only downloads image links http://i.imgur.com/xx.png not images from imgur pages http://imgur.com/page/ and lacks the ability to get image albums.
```
# purpose: downloads images from imgur.com links posted to specified reddits
# platform: python 3.2
# references:
# - reddit JSON -- https://github.com/reddit/reddit/wiki/API:-info.json
import json
import urllib.request
import os, sys
def downloadPostsFrom(subreddit):
"Returns reddit posts as JSON"
try:
f = urllib.request.urlopen("http://www.reddit.com/r/"+subreddit+"/.json");
return json.loads(f.read().decode("utf-8"))["data"]["children"]
except Exception:
print("ERROR: malformed JSON response from reddit.com")
raise ValueError
def parseImageURL_From(posts):
"Returns tuple iterator (url, title)"
for node in posts:
post = node['data']
# only accept links from imgur.com
if post['domain'].endswith('imgur.com'):
yield post["url"], post["title"]
def makeFileExt(content_type):
"Return extension for specified content type"
return {
'image/bmp':'bmp',
'image/gif':'gif',
'image/jpeg':'jpg',
'image/png':'png',
'image/tiff':'tif',
'image/x-icon':'ico'
}.get(content_type, 'txt')
def makeFileName(title):
"Compute valid file name from image title. Return file name or default.ext"
title = title.strip(' .?:')
file_name = title.translate(''.maketrans('\/:*?"<>|$@ .','__.-_____S0___'))
file_name = file_name.strip('_').replace('__','_').lower()
return file_name + '.' if file_name else 'default.'
def makeSaveDir(save_dir):
"Creates directory. Returns a valid directory name."
if not os.path.exists(save_dir):
os.makedirs(save_dir)
r
Solution
-
Never do
-
There's no need to do
-
You can eliminate a race condition in
-
You can use string formatting in place of adding strings.
-
Your
There are other things, these were just the quick things from reading top to bottom :).
P.S. There's an unofficial Python wrapper for the reddit API at https://github.com/praw-dev/praw that you could consider using.
Never do
except Exception:, especially not to pass. You probably want to be catching KeyErrors specifically, or using dict.get in the first case, and in the other cases adjust accordingly.-
There's no need to do
json.loads(file_like_object.read()), just use json.load(file_like_object).-
You can eliminate a race condition in
makeSavedir by not calling os.path.exists, just call os.makedirs (and catch the IOError if you'd like to not raise the exception [and make sure to check its errno]).-
You can use string formatting in place of adding strings.
-
Your
try blocks are too big. They will catch various exceptions that you aren't handling. This is another instance of my first point.There are other things, these were just the quick things from reading top to bottom :).
P.S. There's an unofficial Python wrapper for the reddit API at https://github.com/praw-dev/praw that you could consider using.
Context
StackExchange Code Review Q#13945, answer score: 4
Revisions (0)
No revisions yet.