snippetpythonMinor
Print the list of winter bash 2014 hats as a list of checkboxes in GFM format
Viewed 0 times
formatthecheckboxesgfmhatsprintlist2014bashwinter
Problem
In Winter Bash 2014,
since there is no easy way to see the hats I'm missing per site,
I decided to use Gists for that.
A perhaps not so well-known feature of GitHub Flavered Markdown (GFM) format used everywhere on GitHub is that if you write bullet point lists with embedded
These will be rendered not as regular bullet point lists but a list of checkboxes:
clicking on an item will toggle the checkbox, triggering a commit.
Just one little problem:
there's no obviously easy way to copy-paste the hat names and descriptions from the Winter Bash website.
So I cooked up this simple soup (pun totally intended) using Python's Beautiful Soup:
This produces a list of all the hats in the r
since there is no easy way to see the hats I'm missing per site,
I decided to use Gists for that.
A perhaps not so well-known feature of GitHub Flavered Markdown (GFM) format used everywhere on GitHub is that if you write bullet point lists with embedded
[ ] like this:- [ ] Bill Lumbergh: answer 5 questions on Saturday (UTC)
- [ ] On The Road: ask using the app
- [ ] Bugdroid: use the Android app
- [ ] Not a cherry: use the iOS app
- [ ] I Voted Today: vote limit and 10+ questionsThese will be rendered not as regular bullet point lists but a list of checkboxes:
clicking on an item will toggle the checkbox, triggering a commit.
Just one little problem:
there's no obviously easy way to copy-paste the hat names and descriptions from the Winter Bash website.
So I cooked up this simple soup (pun totally intended) using Python's Beautiful Soup:
#!/usr/bin/env python
from urllib import request
from bs4 import BeautifulSoup
import os
URL = 'http://winterbash2014.stackexchange.com/'
def load_html_doc(url):
cached_file = 'page.html'
if os.path.isfile(cached_file) and os.path.getsize(cached_file):
with open(cached_file) as fh:
html_doc = fh.read()
else:
html_doc = request.urlopen(url).read()
with open(cached_file, 'wb') as fh:
fh.write(html_doc)
return html_doc
def get_soup(url):
html_doc = load_html_doc(url)
return BeautifulSoup(html_doc)
def print_hats_as_gfm_checkboxes(soup):
for box in soup.find_all('a', attrs={'class': 'box'}):
name = box.find(attrs={'class': 'hat-name'}).text
description = box.find(attrs={'class': 'hat-description'}).text
print('- [ ] {}: {}'.format(name, description))
def print_pretty(soup):
print(soup.prettify())
def main():
soup = get_soup(URL)
print_hats_as_gfm_checkboxes(soup)
# print_pretty(soup)
if __name__ == '__main__':
main()This produces a list of all the hats in the r
Solution
Don't do this:
Some people make the poor excuse that PEP 394
recommends that
but this doesn't even apply to you since yours only supports Python 3!
Use this instead:
Your code could do with doc comments; at least
There's nothing wrong with caching files locally, but I would suggest writing to
I would do an early return in
Your
Here's the updated code, still writing to the same file:
After all of the comments on
#!/usr/bin/env pythonSome people make the poor excuse that PEP 394
recommends that
python continue to refer to python2 for the time beingbut this doesn't even apply to you since yours only supports Python 3!
Use this instead:
#!/usr/bin/env python3Your code could do with doc comments; at least
load_html_doc could. These don't need to be long.There's nothing wrong with caching files locally, but I would suggest writing to
tempfile.gettempdir().request.urlopen returns a file-like object. Moments later you use with on a file object but you neglect to do so on request.urlopen's! I would do an early return in
load_html_doc.Your
os.path.isfile(cached_file) check is strange; if it's a directory instead you don't actually deal with the resulting problem. It's not a big deal, but I would use os.path.exists to prevent it looking like you're actually checking.cached_file would be better as a default argument, and I'd call it cache_file or even cache_path. IMHO fh isn't that great a name either.Here's the updated code, still writing to the same file:
def load_html_doc(url, cache_file='page.html'):
if os.path.exists(cache_file) and os.path.getsize(cache_file):
with open(cache_file) as cache:
return cache.read()
with request.urlopen(url) as webpage:
html_doc = webpage.read()
with open(cache_file, 'wb') as cache:
cache.write(html_doc)
return html_docAfter all of the comments on
load_html_doc, I'm sad (happy?) to say there isn't much to talk about with the rest of it. I would remove the # print_pretty(soup) comment, though.Code Snippets
#!/usr/bin/env python#!/usr/bin/env python3def load_html_doc(url, cache_file='page.html'):
if os.path.exists(cache_file) and os.path.getsize(cache_file):
with open(cache_file) as cache:
return cache.read()
with request.urlopen(url) as webpage:
html_doc = webpage.read()
with open(cache_file, 'wb') as cache:
cache.write(html_doc)
return html_docContext
StackExchange Code Review Q#74791, answer score: 3
Revisions (0)
No revisions yet.