patternpythonMinor
Simple list comprehension
Viewed 0 times
comprehensionlistsimple
Problem
As far as I can tell, there is no network access happening at this stage in the code. I am accessing Reddit's API via the PRAW module. However, it is crawling, and I think it should be faster, considering I am hardly doing any work (unless Python heavily penalizes object access?).
This is the entire short script:
All of the network access should happen at
I cannot work out why the list comprehension is so slow. All it should be doing is iterating through some objects and extracting a simple attribute: it should merely get `str(post.autho
This is the entire short script:
import sys, os, pprint, praw
class Scanner(object):
''' A scanner object. '''
def __init__(self):
self.user_agent = 'debian.blah8899889.agent'
self.r = praw.Reddit(user_agent=self.user_agent)
self.nsfw = ('funny', 'nsfw')
self.nsfw_posters = set()
self.content = []
def getSub(self, subreddit):
''' Accepts a subreddit. Adds subreddit posts object to self.content'''
url = 'http://www.reddit.com/r/{sub}/'.format(sub=subreddit)
print 'Scanning:', subreddit
subreddit_posts = self.r.get_content(url, limit=5)
self.addContent(subreddit_posts)
def addContent(self, subreddit):
print 'Adding subreddit posts to content.'
self.content.append(subreddit)
def addNSFWPoster(self, post):
print 'Parsing author and adding to posters.'
self.nsfw_posters.add(str(post.author))
def scanNSFW(self):
''' Scans all NSFW subreddits. Makes list of posters. '''
# Get content from all nsfw subreddits
print 'Executing map function.'
map(self.getSub, self.nsfw)
# Scan content and get authors
print 'Executing list comprehension.'
[self.addNSFWPoster(post) for sub in self.content for post in sub]
def main():
scan = Scanner()
scan.scanNSFW()
main()All of the network access should happen at
map(self.getSub, self.nsfw). This actually runs quite fast considering I am rate limited by Reddit's servers.I cannot work out why the list comprehension is so slow. All it should be doing is iterating through some objects and extracting a simple attribute: it should merely get `str(post.autho
Solution
Well for starters with
you are constructing a list of
with a length equal to
that you are immediately throwing away because you aren't assigning it to a variable. In other words you should only use a list comprehension for building a list, not for calling a method that is doing work elsewhere.
An alternative implementation is as you might expect a nested for loop:
To speed this up further you could reduce your function call overhead with the following:
[self.addNSFWPoster(post) for sub in self.content for post in sub]you are constructing a list of
[None, None, ... , None]with a length equal to
len(self.content) * len(sub)that you are immediately throwing away because you aren't assigning it to a variable. In other words you should only use a list comprehension for building a list, not for calling a method that is doing work elsewhere.
An alternative implementation is as you might expect a nested for loop:
for sub in self.content:
for post in sub:
self.addNSFWPoster(post)To speed this up further you could reduce your function call overhead with the following:
for sub in self.content:
for post in sub:
print 'Parsing author and adding to posters.'
self.nsfw_posters.add(str(post.author))Code Snippets
[self.addNSFWPoster(post) for sub in self.content for post in sub][None, None, ... , None]len(self.content) * len(sub)for sub in self.content:
for post in sub:
self.addNSFWPoster(post)for sub in self.content:
for post in sub:
print 'Parsing author and adding to posters.'
self.nsfw_posters.add(str(post.author))Context
StackExchange Code Review Q#83675, answer score: 4
Revisions (0)
No revisions yet.