HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python Politico API attempt

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
apipoliticopythonattempt

Problem

I love politics, and I love programming, so I figured why not try and combine the two for something to do? I'm making a work-in-progress (but runnable at this stage) Politico api that I call "pylitico.":

```
from bs4 import BeautifulSoup
import requests
import time
import ast
import re

story_link = re.compile('a href="(http:\/\/www.politico.com\/story.*)" target')
utag_regex = re.compile('var utag_data = \n(\{.*);')

today = time.strftime("%m/%d/%y")

class Article():
def __init__(self, content_id, tags, author,
datestamp, section, headline, story):
"""
:type tags: list
:type content_id: str
:type author: list
:type datestamp: DateTime
:type section: str
:type headline: str
:type story: str
"""

self.content_id = content_id
self.tags = tags
self.author = author
self.datestamp = datestamp
self.section = section
self.headline = headline
self.story = story

def __str__(self):
return "{0}".format(self.headline)

class Pylitico():
def __init__(self):
"""Creates a connection to Politico"""
self.session = requests.Session()

def most_read(self):
"""Collects the Most Read section of Politico, returns
stories as list of Article class objects"""
r = self.session.get('http://www.politico.com/congress/?tab=most-read')
soup = BeautifulSoup(r.content, 'html.parser')
most_read_frame = [i for i in soup.find_all('div',
{'class': 'dari-frame dari-frame-loaded'}) if
'most-read' in i.attrs.get('name')][0]
links = [i.find('a').attrs.get('href') for i in
most_read_frame.find_all('article', {'class': 'story-frag format-xxs'})]
stories = [self.story_parser(link) for link in links]
return stories

def todays_stories(self):
"""Collec

Solution

most_read_frame = [i for i in soup.find_all('div',
                                                {'class': 'dari-frame dari-frame-loaded'}) if
                       'most-read' in i.attrs.get('name')][0]


can be made more efficient by using islice:

from itertools import islice
most_read_frame_gen = (i for i in soup.find_all('div',
                                                {'class': 'dari-frame dari-frame-loaded'}) if
                       'most-read' in i.attrs.get('name'))
most_read_frame = islice(most_read_frame_gen, 0, 1)


as it will stop iterating after it gets the first value.

Also this is a bit of bad form:

for _ in most_read_stories[0:1]:
    print(_.headline)


_ is used for throwaway variables by convention. It'd be more readable to call it something like story or even just s:

for story in most_read_stories[0:1]:
    print(story.headline)


In general, though, it looks good. You do realize you're in a bit of an 'arms race' though, right? If Politico changes the format of its site, you'll have to change your code, etc, etc. In that vein, I suggest you document what date you made it work, so potential users can judge whether it's too out of date to be worth bothering with.

Code Snippets

most_read_frame = [i for i in soup.find_all('div',
                                                {'class': 'dari-frame dari-frame-loaded'}) if
                       'most-read' in i.attrs.get('name')][0]
from itertools import islice
most_read_frame_gen = (i for i in soup.find_all('div',
                                                {'class': 'dari-frame dari-frame-loaded'}) if
                       'most-read' in i.attrs.get('name'))
most_read_frame = islice(most_read_frame_gen, 0, 1)
for _ in most_read_stories[0:1]:
    print(_.headline)
for story in most_read_stories[0:1]:
    print(story.headline)

Context

StackExchange Code Review Q#138763, answer score: 3

Revisions (0)

No revisions yet.