debugpythonMinor
RateBeer.com scraper
Viewed 0 times
comscraperratebeer
Problem
This was largely an exercise in making my code more Pythonic, especially in catching errors and doing things the right way.
I opted to make the
If you prefer, the code is on Github.
```
from bs4 import BeautifulSoup
import requests
import re
import exceptions
class RateBeer():
"""
Makes getting information about beers and breweries from RateBeer.com as easy as:
>>> summit_epa = RateBeer().beer("summit extra pale ale")
A utility for searching RateBeer.com, finding information about beers, breweries, and reviews.
The nature of web scraping means that this package is offered in perpetual beta.
Requires BeautifulSoup, Requests, and lxml.
See https://github.com/alilja/ratebeer for the full README.
"""
class PageNotFound(Exception):
pass
def __init__(self):
self.BASE_URL = "http://www.ratebeer.com"
def _search(self, query):
# this feels bad to me
# but if it fits, i sits
payload = {"BeerName": query}
r = requests.post(self.BASE_URL+"/findbeer.asp", data = payload)
return BeautifulSoup(r.text, "lxml")
def _parse(self, soup):
s_results = soup.find_all('table',{'class':'results'})
output = {"breweries":[],"beers":[]}
beer_location = 0
# find brewery information
if any("brewers" in s for s in soup.find_all("h1")):
s_breweries = s_results[0].find_all('tr')
beer_location = 1
for row in s_breweries:
location = row.find('td',{'align':'right'})
output['breweries'].append({
"name":row.a.contents,
"url":row.a.get('href'),
"id":re.search("/(?P\d*)/",row.a.get('href')).group('id'),
"location":location.text.strip(),
})
# fi
I opted to make the
PageNotFound exception part of the class so that users could simply from ratebeer import RateBeer and not have to worry about anything else.If you prefer, the code is on Github.
```
from bs4 import BeautifulSoup
import requests
import re
import exceptions
class RateBeer():
"""
Makes getting information about beers and breweries from RateBeer.com as easy as:
>>> summit_epa = RateBeer().beer("summit extra pale ale")
A utility for searching RateBeer.com, finding information about beers, breweries, and reviews.
The nature of web scraping means that this package is offered in perpetual beta.
Requires BeautifulSoup, Requests, and lxml.
See https://github.com/alilja/ratebeer for the full README.
"""
class PageNotFound(Exception):
pass
def __init__(self):
self.BASE_URL = "http://www.ratebeer.com"
def _search(self, query):
# this feels bad to me
# but if it fits, i sits
payload = {"BeerName": query}
r = requests.post(self.BASE_URL+"/findbeer.asp", data = payload)
return BeautifulSoup(r.text, "lxml")
def _parse(self, soup):
s_results = soup.find_all('table',{'class':'results'})
output = {"breweries":[],"beers":[]}
beer_location = 0
# find brewery information
if any("brewers" in s for s in soup.find_all("h1")):
s_breweries = s_results[0].find_all('tr')
beer_location = 1
for row in s_breweries:
location = row.find('td',{'align':'right'})
output['breweries'].append({
"name":row.a.contents,
"url":row.a.get('href'),
"id":re.search("/(?P\d*)/",row.a.get('href')).group('id'),
"location":location.text.strip(),
})
# fi
Solution
Coding style
It's a bit hard to read this code because it doesn't follow PEP8.
The violations that stick in the eye the most:
There are everywhere in the code.
I suggest to get the
run it on your project and correct all the violations.
Even with all PEP8 violations fixed,
the could would benefit from more generous use of vertical spacing.
For example the
It would be better to put some blank lines occasionally to create a sense of visual grouping of tightly related code,
separating from loosely related ones.
Mutually exclusive
It seems to me that these conditions are mutually exclusive:
As such, it's a waste to make the program evaluate them all unnecessarily.
These should be chained together with
Don't repeat yourself
This piece of code appears in many places:
It would be better to create a helper method for this:
Other issues
Remove unused imports:
This docstring is wrong:
Should have been:
Modern style classes should extend
It's a bit hard to read this code because it doesn't follow PEP8.
The violations that stick in the eye the most:
- No spacing around commas:
- bad :
{"breweries":[],"beers":[]}
- good:
{"breweries": [], "beers": []}
- No line breaks after
:, and unconventional spacing inifstatements, for example inif "ABV" in label.text: key = "abv"
There are everywhere in the code.
I suggest to get the
pep8 command line tool (pip install pep8),run it on your project and correct all the violations.
Even with all PEP8 violations fixed,
the could would benefit from more generous use of vertical spacing.
For example the
beer and reviews methods are too dense.It would be better to put some blank lines occasionally to create a sense of visual grouping of tightly related code,
separating from loosely related ones.
Mutually exclusive
if statementsIt seems to me that these conditions are mutually exclusive:
if "RATINGS" in label.text: key = "num_ratings"
if "CALORIES" in label.text: key = "calories"
if "ABV" in label.text: key = "abv"
if "SEASONAL" in label.text: key = "season"
if "IBU" in label.text: key = "ibu"As such, it's a waste to make the program evaluate them all unnecessarily.
These should be chained together with
elif.Don't repeat yourself
This piece of code appears in many places:
soup = BeautifulSoup(r.text, "lxml")It would be better to create a helper method for this:
def get_soup(text):
return BeautifulSoup(text, "lxml")Other issues
Remove unused imports:
import exceptionsThis docstring is wrong:
>>> summit_epa = RateBeer().beer("summit extra pale ale")Should have been:
>>> RateBeer().search("summit extra pale ale")
>>> summit_epa = RateBeer().beer("/beer/summit-extra-pale-ale/7344/")Modern style classes should extend
object:class RateBeer(object):Code Snippets
if "RATINGS" in label.text: key = "num_ratings"
if "CALORIES" in label.text: key = "calories"
if "ABV" in label.text: key = "abv"
if "SEASONAL" in label.text: key = "season"
if "IBU" in label.text: key = "ibu"soup = BeautifulSoup(r.text, "lxml")def get_soup(text):
return BeautifulSoup(text, "lxml")import exceptions>>> summit_epa = RateBeer().beer("summit extra pale ale")Context
StackExchange Code Review Q#69909, answer score: 5
Revisions (0)
No revisions yet.