HiveBrain v1.2.0
Get Started
← Back to all entries
debugpythonMinor

RateBeer.com scraper

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
comscraperratebeer

Problem

This was largely an exercise in making my code more Pythonic, especially in catching errors and doing things the right way.

I opted to make the PageNotFound exception part of the class so that users could simply from ratebeer import RateBeer and not have to worry about anything else.

If you prefer, the code is on Github.

```
from bs4 import BeautifulSoup
import requests
import re

import exceptions

class RateBeer():
"""
Makes getting information about beers and breweries from RateBeer.com as easy as:
>>> summit_epa = RateBeer().beer("summit extra pale ale")
A utility for searching RateBeer.com, finding information about beers, breweries, and reviews.
The nature of web scraping means that this package is offered in perpetual beta.
Requires BeautifulSoup, Requests, and lxml.
See https://github.com/alilja/ratebeer for the full README.
"""

class PageNotFound(Exception):
pass

def __init__(self):
self.BASE_URL = "http://www.ratebeer.com"

def _search(self, query):
# this feels bad to me
# but if it fits, i sits
payload = {"BeerName": query}
r = requests.post(self.BASE_URL+"/findbeer.asp", data = payload)
return BeautifulSoup(r.text, "lxml")

def _parse(self, soup):
s_results = soup.find_all('table',{'class':'results'})
output = {"breweries":[],"beers":[]}
beer_location = 0
# find brewery information
if any("brewers" in s for s in soup.find_all("h1")):
s_breweries = s_results[0].find_all('tr')
beer_location = 1
for row in s_breweries:
location = row.find('td',{'align':'right'})
output['breweries'].append({
"name":row.a.contents,
"url":row.a.get('href'),
"id":re.search("/(?P\d*)/",row.a.get('href')).group('id'),
"location":location.text.strip(),
})
# fi

Solution

Coding style

It's a bit hard to read this code because it doesn't follow PEP8.
The violations that stick in the eye the most:

  • No spacing around commas:



  • bad : {"breweries":[],"beers":[]}



  • good: {"breweries": [], "beers": []}



  • No line breaks after :, and unconventional spacing in if statements, for example in if "ABV" in label.text: key = "abv"



There are everywhere in the code.
I suggest to get the pep8 command line tool (pip install pep8),
run it on your project and correct all the violations.

Even with all PEP8 violations fixed,
the could would benefit from more generous use of vertical spacing.
For example the beer and reviews methods are too dense.
It would be better to put some blank lines occasionally to create a sense of visual grouping of tightly related code,
separating from loosely related ones.

Mutually exclusive if statements

It seems to me that these conditions are mutually exclusive:

if "RATINGS" in label.text:  key = "num_ratings"
if "CALORIES" in label.text: key = "calories"
if "ABV" in label.text:      key = "abv"
if "SEASONAL" in label.text: key = "season"
if "IBU" in label.text:      key = "ibu"


As such, it's a waste to make the program evaluate them all unnecessarily.
These should be chained together with elif.

Don't repeat yourself

This piece of code appears in many places:

soup = BeautifulSoup(r.text, "lxml")


It would be better to create a helper method for this:

def get_soup(text):
    return BeautifulSoup(text, "lxml")


Other issues

Remove unused imports:

import exceptions


This docstring is wrong:

>>> summit_epa = RateBeer().beer("summit extra pale ale")


Should have been:

>>> RateBeer().search("summit extra pale ale")
>>> summit_epa = RateBeer().beer("/beer/summit-extra-pale-ale/7344/")


Modern style classes should extend object:

class RateBeer(object):

Code Snippets

if "RATINGS" in label.text:  key = "num_ratings"
if "CALORIES" in label.text: key = "calories"
if "ABV" in label.text:      key = "abv"
if "SEASONAL" in label.text: key = "season"
if "IBU" in label.text:      key = "ibu"
soup = BeautifulSoup(r.text, "lxml")
def get_soup(text):
    return BeautifulSoup(text, "lxml")
import exceptions
>>> summit_epa = RateBeer().beer("summit extra pale ale")

Context

StackExchange Code Review Q#69909, answer score: 5

Revisions (0)

No revisions yet.