HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Streaming hacker news posts

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
newspostsstreaminghacker

Problem

Hacker News recently released an official API. Unfortunately, it cannot be used to get the newest posts submitted. I wrote a script to provide a faux streaming-api like method (like praw reddit's submission_stream) so that I can simply put it in a loop and do something like make a gtk-notification when title contains the string 'python'.

```
"""
Usage:

from hn import HNStream

h = HNStream()

for item in h.stream():
print "Title: ", item.title
print "By user: ", item.by
"""

from lxml import html as lh
import re
import requests
import threading
import time
import ujson as json

__all__ = ['HNStream']

class Item:

def __init__(self, **kwargs):
self.__dict__.update(kwargs)

def __str__(self):
return "{}: {}".format(self.type, self.title)

__repr__ = __str__

class HackerNews:

def __init__(self):
self.base_url = 'https://hacker-news.firebaseio.com/v0/{api}'

def get(self, uri):
response = requests.get(uri)
if response.status_code == requests.codes.ok:
return json.loads(response.text)
else:
raise Exception('HTTP Error {}'.format(response.status_code))

def item(self, item_id):
uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})
return Item(**self.get(uri))

class SimpleFIFO:

def __init__(self, length):
self.length = length
self.values = length * [None]

def contains(self, iid):
return iid in self.values

def append(self, value):
assert isinstance(value, str), "Only append strings"
self.values.append(value)
while len(self.values) > self.length:
self.values.pop(0)
assert len(self.values) == self.length

def __str__(self):
return str(self.values)

class HNStream(threading.Thread):

def __init__(self):
threading.Thread.__init__(self)

def stream(self):

itembuffer = SimpleFIFO(length=30)
while True:

Solution

I have a number of suggestions:

Don't use self.dict

def __init__(self, **kwargs):
    self.__dict__.update(kwargs)


It is pretty unclear which attributes the class will have after calling the constructor, as it is basically up to the caller to decide this. This is not optimal, as you want to be in control of what data is set in an item. I'd suggest to use named arguments instead:

def __init__(self, id, type, title): 
    self.type = type
    self.title = title
    self.id = id
    # Add other attributes that I don't know of?


While this may seem like blowing up the code, it makes it a lot easier to understand for an uninformed reader, and therefor should be preferred IMHO.

Append instead of format

Since you are basically just appending something to self.base_url, you might as well append using + instead of using format. I find it a lot easier to read, but that depends a lot on what you prefer. Consequently:

self.base_url = 'https://hacker-news.firebaseio.com/v0/'


Then

uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})


would look like this if using append:

uri = self.base_url + "item/{}.json".format(item_id)


HackerNews.get()

The get() method's name does not explain what the method does. I'd suggest something offering a bit more documentation, for example: download_item() or download_entry().

On a little side note: You can remove the else: statement, as the raise Exception(..) command will only be executed if the if statement evaluates to False. This depends a bit on personal taste though. I'd also change the ordering like this:

if response.status_code != requests.codes.ok:
    raise Exception('HTTP Error {}'.format(response.status_code))
return json.loads(response.text)


I'd prefer this ordering as it has the return statement at the end, which is where I'd look first when searching for it.

SimpleFIFO

Maybe you can use Python's Queue class? See here for documentation: https://docs.python.org/2/library/queue.html

It also provides a max size, the only thing missing is the check for adding only strings.

HNStream.init()

You can use the super keyword (see also: https://stackoverflow.com/questions/576169/understanding-python-super-and-init-methods) instead of explicitly calling the constructor like this:

def __init__(self):
    super(HNSTream, self).__init__()


Or in Python 3:

def __init__(self):
    super().__init__()


HNStream.stream()

Just a few naming suggestions:

  • r -> result



  • iid -> item_id



That was it, hope it helps!

Code Snippets

def __init__(self, **kwargs):
    self.__dict__.update(kwargs)
def __init__(self, id, type, title): 
    self.type = type
    self.title = title
    self.id = id
    # Add other attributes that I don't know of?
self.base_url = 'https://hacker-news.firebaseio.com/v0/'
uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})
uri = self.base_url + "item/{}.json".format(item_id)

Context

StackExchange Code Review Q#70415, answer score: 3

Revisions (0)

No revisions yet.