HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Download image links posted to reddit.com

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
imagelinksdownloadcompostedreddit

Problem

This is a Python script to save imgur pictures posted to reddit.com forums. I'm looking for an assessment on the design of this script and any web security issues that might exist.

Obvious shortcomings: it only downloads image links http://i.imgur.com/xx.png not images from imgur pages http://imgur.com/page/ and lacks the ability to get image albums.

```
# purpose: downloads images from imgur.com links posted to specified reddits
# platform: python 3.2
# references:
# - reddit JSON -- https://github.com/reddit/reddit/wiki/API:-info.json

import json
import urllib.request
import os, sys

def downloadPostsFrom(subreddit):
"Returns reddit posts as JSON"
try:
f = urllib.request.urlopen("http://www.reddit.com/r/"+subreddit+"/.json");
return json.loads(f.read().decode("utf-8"))["data"]["children"]
except Exception:
print("ERROR: malformed JSON response from reddit.com")
raise ValueError

def parseImageURL_From(posts):
"Returns tuple iterator (url, title)"
for node in posts:
post = node['data']
# only accept links from imgur.com
if post['domain'].endswith('imgur.com'):
yield post["url"], post["title"]

def makeFileExt(content_type):
"Return extension for specified content type"
return {
'image/bmp':'bmp',
'image/gif':'gif',
'image/jpeg':'jpg',
'image/png':'png',
'image/tiff':'tif',
'image/x-icon':'ico'
}.get(content_type, 'txt')

def makeFileName(title):
"Compute valid file name from image title. Return file name or default.ext"
title = title.strip(' .?:')
file_name = title.translate(''.maketrans('\/:*?"<>|$@ .','__.-_____S0___'))
file_name = file_name.strip('_').replace('__','_').lower()
return file_name + '.' if file_name else 'default.'

def makeSaveDir(save_dir):
"Creates directory. Returns a valid directory name."
if not os.path.exists(save_dir):
os.makedirs(save_dir)
r

Solution

-
Never do except Exception:, especially not to pass. You probably want to be catching KeyErrors specifically, or using dict.get in the first case, and in the other cases adjust accordingly.

-
There's no need to do json.loads(file_like_object.read()), just use json.load(file_like_object).

-
You can eliminate a race condition in makeSavedir by not calling os.path.exists, just call os.makedirs (and catch the IOError if you'd like to not raise the exception [and make sure to check its errno]).

-
You can use string formatting in place of adding strings.

-
Your try blocks are too big. They will catch various exceptions that you aren't handling. This is another instance of my first point.

There are other things, these were just the quick things from reading top to bottom :).

P.S. There's an unofficial Python wrapper for the reddit API at https://github.com/praw-dev/praw that you could consider using.

Context

StackExchange Code Review Q#13945, answer score: 4

Revisions (0)

No revisions yet.