patternpythonMinor
Python Politico API attempt
Viewed 0 times
apipoliticopythonattempt
Problem
I love politics, and I love programming, so I figured why not try and combine the two for something to do? I'm making a work-in-progress (but runnable at this stage) Politico api that I call "pylitico.":
```
from bs4 import BeautifulSoup
import requests
import time
import ast
import re
story_link = re.compile('a href="(http:\/\/www.politico.com\/story.*)" target')
utag_regex = re.compile('var utag_data = \n(\{.*);')
today = time.strftime("%m/%d/%y")
class Article():
def __init__(self, content_id, tags, author,
datestamp, section, headline, story):
"""
:type tags: list
:type content_id: str
:type author: list
:type datestamp: DateTime
:type section: str
:type headline: str
:type story: str
"""
self.content_id = content_id
self.tags = tags
self.author = author
self.datestamp = datestamp
self.section = section
self.headline = headline
self.story = story
def __str__(self):
return "{0}".format(self.headline)
class Pylitico():
def __init__(self):
"""Creates a connection to Politico"""
self.session = requests.Session()
def most_read(self):
"""Collects the Most Read section of Politico, returns
stories as list of Article class objects"""
r = self.session.get('http://www.politico.com/congress/?tab=most-read')
soup = BeautifulSoup(r.content, 'html.parser')
most_read_frame = [i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name')][0]
links = [i.find('a').attrs.get('href') for i in
most_read_frame.find_all('article', {'class': 'story-frag format-xxs'})]
stories = [self.story_parser(link) for link in links]
return stories
def todays_stories(self):
"""Collec
```
from bs4 import BeautifulSoup
import requests
import time
import ast
import re
story_link = re.compile('a href="(http:\/\/www.politico.com\/story.*)" target')
utag_regex = re.compile('var utag_data = \n(\{.*);')
today = time.strftime("%m/%d/%y")
class Article():
def __init__(self, content_id, tags, author,
datestamp, section, headline, story):
"""
:type tags: list
:type content_id: str
:type author: list
:type datestamp: DateTime
:type section: str
:type headline: str
:type story: str
"""
self.content_id = content_id
self.tags = tags
self.author = author
self.datestamp = datestamp
self.section = section
self.headline = headline
self.story = story
def __str__(self):
return "{0}".format(self.headline)
class Pylitico():
def __init__(self):
"""Creates a connection to Politico"""
self.session = requests.Session()
def most_read(self):
"""Collects the Most Read section of Politico, returns
stories as list of Article class objects"""
r = self.session.get('http://www.politico.com/congress/?tab=most-read')
soup = BeautifulSoup(r.content, 'html.parser')
most_read_frame = [i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name')][0]
links = [i.find('a').attrs.get('href') for i in
most_read_frame.find_all('article', {'class': 'story-frag format-xxs'})]
stories = [self.story_parser(link) for link in links]
return stories
def todays_stories(self):
"""Collec
Solution
most_read_frame = [i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name')][0]can be made more efficient by using
islice:from itertools import islice
most_read_frame_gen = (i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name'))
most_read_frame = islice(most_read_frame_gen, 0, 1)as it will stop iterating after it gets the first value.
Also this is a bit of bad form:
for _ in most_read_stories[0:1]:
print(_.headline)_ is used for throwaway variables by convention. It'd be more readable to call it something like story or even just s:for story in most_read_stories[0:1]:
print(story.headline)In general, though, it looks good. You do realize you're in a bit of an 'arms race' though, right? If Politico changes the format of its site, you'll have to change your code, etc, etc. In that vein, I suggest you document what date you made it work, so potential users can judge whether it's too out of date to be worth bothering with.
Code Snippets
most_read_frame = [i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name')][0]from itertools import islice
most_read_frame_gen = (i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name'))
most_read_frame = islice(most_read_frame_gen, 0, 1)for _ in most_read_stories[0:1]:
print(_.headline)for story in most_read_stories[0:1]:
print(story.headline)Context
StackExchange Code Review Q#138763, answer score: 3
Revisions (0)
No revisions yet.