HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Simple list comprehension

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
comprehensionlistsimple

Problem

As far as I can tell, there is no network access happening at this stage in the code. I am accessing Reddit's API via the PRAW module. However, it is crawling, and I think it should be faster, considering I am hardly doing any work (unless Python heavily penalizes object access?).

This is the entire short script:

import sys, os, pprint, praw

class Scanner(object):
    ''' A scanner object. '''
    def __init__(self):
        self.user_agent = 'debian.blah8899889.agent'
        self.r = praw.Reddit(user_agent=self.user_agent)
        self.nsfw = ('funny', 'nsfw')
        self.nsfw_posters = set()
        self.content = []

    def getSub(self, subreddit):
        ''' Accepts a subreddit. Adds subreddit posts object to self.content'''
        url = 'http://www.reddit.com/r/{sub}/'.format(sub=subreddit)
        print 'Scanning:', subreddit
        subreddit_posts = self.r.get_content(url, limit=5)
        self.addContent(subreddit_posts)

    def addContent(self, subreddit):
        print 'Adding subreddit posts to content.'
        self.content.append(subreddit)

    def addNSFWPoster(self, post):
        print 'Parsing author and adding to posters.'
        self.nsfw_posters.add(str(post.author))

    def scanNSFW(self):
        ''' Scans all NSFW subreddits. Makes list of posters. '''
#       Get content from all nsfw subreddits
        print 'Executing map function.'
        map(self.getSub, self.nsfw)
#       Scan content and get authors
        print 'Executing list comprehension.'
        [self.addNSFWPoster(post) for sub in self.content for post in sub]

def main():
    scan = Scanner()
    scan.scanNSFW()

main()


All of the network access should happen at map(self.getSub, self.nsfw). This actually runs quite fast considering I am rate limited by Reddit's servers.

I cannot work out why the list comprehension is so slow. All it should be doing is iterating through some objects and extracting a simple attribute: it should merely get `str(post.autho

Solution

Well for starters with

[self.addNSFWPoster(post) for sub in self.content for post in sub]


you are constructing a list of

[None, None, ... , None]


with a length equal to

len(self.content) * len(sub)


that you are immediately throwing away because you aren't assigning it to a variable. In other words you should only use a list comprehension for building a list, not for calling a method that is doing work elsewhere.

An alternative implementation is as you might expect a nested for loop:

for sub in self.content:
    for post in sub:
        self.addNSFWPoster(post)


To speed this up further you could reduce your function call overhead with the following:

for sub in self.content:
    for post in sub:
        print 'Parsing author and adding to posters.'
        self.nsfw_posters.add(str(post.author))

Code Snippets

[self.addNSFWPoster(post) for sub in self.content for post in sub]
[None, None, ... , None]
len(self.content) * len(sub)
for sub in self.content:
    for post in sub:
        self.addNSFWPoster(post)
for sub in self.content:
    for post in sub:
        print 'Parsing author and adding to posters.'
        self.nsfw_posters.add(str(post.author))

Context

StackExchange Code Review Q#83675, answer score: 4

Revisions (0)

No revisions yet.