patternpythonMinor
Streaming hacker news posts
Viewed 0 times
newspostsstreaminghacker
Problem
Hacker News recently released an official API. Unfortunately, it cannot be used to get the newest posts submitted. I wrote a script to provide a faux streaming-api like method (like praw reddit's
```
"""
Usage:
from hn import HNStream
h = HNStream()
for item in h.stream():
print "Title: ", item.title
print "By user: ", item.by
"""
from lxml import html as lh
import re
import requests
import threading
import time
import ujson as json
__all__ = ['HNStream']
class Item:
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __str__(self):
return "{}: {}".format(self.type, self.title)
__repr__ = __str__
class HackerNews:
def __init__(self):
self.base_url = 'https://hacker-news.firebaseio.com/v0/{api}'
def get(self, uri):
response = requests.get(uri)
if response.status_code == requests.codes.ok:
return json.loads(response.text)
else:
raise Exception('HTTP Error {}'.format(response.status_code))
def item(self, item_id):
uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})
return Item(**self.get(uri))
class SimpleFIFO:
def __init__(self, length):
self.length = length
self.values = length * [None]
def contains(self, iid):
return iid in self.values
def append(self, value):
assert isinstance(value, str), "Only append strings"
self.values.append(value)
while len(self.values) > self.length:
self.values.pop(0)
assert len(self.values) == self.length
def __str__(self):
return str(self.values)
class HNStream(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def stream(self):
itembuffer = SimpleFIFO(length=30)
while True:
submission_stream) so that I can simply put it in a loop and do something like make a gtk-notification when title contains the string 'python'.```
"""
Usage:
from hn import HNStream
h = HNStream()
for item in h.stream():
print "Title: ", item.title
print "By user: ", item.by
"""
from lxml import html as lh
import re
import requests
import threading
import time
import ujson as json
__all__ = ['HNStream']
class Item:
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __str__(self):
return "{}: {}".format(self.type, self.title)
__repr__ = __str__
class HackerNews:
def __init__(self):
self.base_url = 'https://hacker-news.firebaseio.com/v0/{api}'
def get(self, uri):
response = requests.get(uri)
if response.status_code == requests.codes.ok:
return json.loads(response.text)
else:
raise Exception('HTTP Error {}'.format(response.status_code))
def item(self, item_id):
uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})
return Item(**self.get(uri))
class SimpleFIFO:
def __init__(self, length):
self.length = length
self.values = length * [None]
def contains(self, iid):
return iid in self.values
def append(self, value):
assert isinstance(value, str), "Only append strings"
self.values.append(value)
while len(self.values) > self.length:
self.values.pop(0)
assert len(self.values) == self.length
def __str__(self):
return str(self.values)
class HNStream(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def stream(self):
itembuffer = SimpleFIFO(length=30)
while True:
Solution
I have a number of suggestions:
Don't use self.dict
It is pretty unclear which attributes the class will have after calling the constructor, as it is basically up to the caller to decide this. This is not optimal, as you want to be in control of what data is set in an item. I'd suggest to use named arguments instead:
While this may seem like blowing up the code, it makes it a lot easier to understand for an uninformed reader, and therefor should be preferred IMHO.
Append instead of format
Since you are basically just appending something to
Then
would look like this if using append:
HackerNews.get()
The
On a little side note: You can remove the
I'd prefer this ordering as it has the
SimpleFIFO
Maybe you can use Python's
It also provides a max size, the only thing missing is the check for adding only strings.
HNStream.init()
You can use the
Or in Python 3:
HNStream.stream()
Just a few naming suggestions:
That was it, hope it helps!
Don't use self.dict
def __init__(self, **kwargs):
self.__dict__.update(kwargs)It is pretty unclear which attributes the class will have after calling the constructor, as it is basically up to the caller to decide this. This is not optimal, as you want to be in control of what data is set in an item. I'd suggest to use named arguments instead:
def __init__(self, id, type, title):
self.type = type
self.title = title
self.id = id
# Add other attributes that I don't know of?While this may seem like blowing up the code, it makes it a lot easier to understand for an uninformed reader, and therefor should be preferred IMHO.
Append instead of format
Since you are basically just appending something to
self.base_url, you might as well append using + instead of using format. I find it a lot easier to read, but that depends a lot on what you prefer. Consequently:self.base_url = 'https://hacker-news.firebaseio.com/v0/'Then
uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})would look like this if using append:
uri = self.base_url + "item/{}.json".format(item_id)HackerNews.get()
The
get() method's name does not explain what the method does. I'd suggest something offering a bit more documentation, for example: download_item() or download_entry().On a little side note: You can remove the
else: statement, as the raise Exception(..) command will only be executed if the if statement evaluates to False. This depends a bit on personal taste though. I'd also change the ordering like this:if response.status_code != requests.codes.ok:
raise Exception('HTTP Error {}'.format(response.status_code))
return json.loads(response.text)I'd prefer this ordering as it has the
return statement at the end, which is where I'd look first when searching for it.SimpleFIFO
Maybe you can use Python's
Queue class? See here for documentation: https://docs.python.org/2/library/queue.htmlIt also provides a max size, the only thing missing is the check for adding only strings.
HNStream.init()
You can use the
super keyword (see also: https://stackoverflow.com/questions/576169/understanding-python-super-and-init-methods) instead of explicitly calling the constructor like this:def __init__(self):
super(HNSTream, self).__init__()Or in Python 3:
def __init__(self):
super().__init__()HNStream.stream()
Just a few naming suggestions:
r-> result
iid-> item_id
That was it, hope it helps!
Code Snippets
def __init__(self, **kwargs):
self.__dict__.update(kwargs)def __init__(self, id, type, title):
self.type = type
self.title = title
self.id = id
# Add other attributes that I don't know of?self.base_url = 'https://hacker-news.firebaseio.com/v0/'uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})uri = self.base_url + "item/{}.json".format(item_id)Context
StackExchange Code Review Q#70415, answer score: 3
Revisions (0)
No revisions yet.