snippetpythonMinor

Print the list of winter bash 2014 hats as a list of checkboxes in GFM format

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

formatthecheckboxesgfmhatsprintlist2014bashwinter

Problem

In Winter Bash 2014,
since there is no easy way to see the hats I'm missing per site,
I decided to use Gists for that.
A perhaps not so well-known feature of GitHub Flavered Markdown (GFM) format used everywhere on GitHub is that if you write bullet point lists with embedded [ ] like this:

- [ ] Bill Lumbergh: answer 5 questions on Saturday (UTC)
- [ ] On The Road: ask using the app
- [ ] Bugdroid: use the Android app
- [ ] Not a cherry: use the iOS app
- [ ] I Voted Today: vote limit and 10+ questions

These will be rendered not as regular bullet point lists but a list of checkboxes:
clicking on an item will toggle the checkbox, triggering a commit.

Just one little problem:
there's no obviously easy way to copy-paste the hat names and descriptions from the Winter Bash website.
So I cooked up this simple soup (pun totally intended) using Python's Beautiful Soup:

#!/usr/bin/env python

from urllib import request

from bs4 import BeautifulSoup
import os

URL = 'http://winterbash2014.stackexchange.com/'

def load_html_doc(url):
    cached_file = 'page.html'
    if os.path.isfile(cached_file) and os.path.getsize(cached_file):
        with open(cached_file) as fh:
            html_doc = fh.read()
    else:
        html_doc = request.urlopen(url).read()
        with open(cached_file, 'wb') as fh:
            fh.write(html_doc)
    return html_doc

def get_soup(url):
    html_doc = load_html_doc(url)
    return BeautifulSoup(html_doc)

def print_hats_as_gfm_checkboxes(soup):
    for box in soup.find_all('a', attrs={'class': 'box'}):
        name = box.find(attrs={'class': 'hat-name'}).text
        description = box.find(attrs={'class': 'hat-description'}).text
        print('- [ ] {}: {}'.format(name, description))

def print_pretty(soup):
    print(soup.prettify())

def main():
    soup = get_soup(URL)
    print_hats_as_gfm_checkboxes(soup)
    # print_pretty(soup)

if __name__ == '__main__':
    main()

This produces a list of all the hats in the r

Solution

Don't do this:

#!/usr/bin/env python

Some people make the poor excuse that PEP 394

recommends that python continue to refer to python2 for the time being

but this doesn't even apply to you since yours only supports Python 3!

Use this instead:

#!/usr/bin/env python3

Your code could do with doc comments; at least load_html_doc could. These don't need to be long.

There's nothing wrong with caching files locally, but I would suggest writing to tempfile.gettempdir().

request.urlopen returns a file-like object. Moments later you use with on a file object but you neglect to do so on request.urlopen's!

I would do an early return in load_html_doc.

Your os.path.isfile(cached_file) check is strange; if it's a directory instead you don't actually deal with the resulting problem. It's not a big deal, but I would use os.path.exists to prevent it looking like you're actually checking.

cached_file would be better as a default argument, and I'd call it cache_file or even cache_path. IMHO fh isn't that great a name either.

Here's the updated code, still writing to the same file:

def load_html_doc(url, cache_file='page.html'):
    if os.path.exists(cache_file) and os.path.getsize(cache_file):
        with open(cache_file) as cache:
            return cache.read()

    with request.urlopen(url) as webpage:
        html_doc = webpage.read()
    with open(cache_file, 'wb') as cache:
        cache.write(html_doc)
    return html_doc

After all of the comments on load_html_doc, I'm sad (happy?) to say there isn't much to talk about with the rest of it. I would remove the # print_pretty(soup) comment, though.

Code Snippets

#!/usr/bin/env python

#!/usr/bin/env python3

def load_html_doc(url, cache_file='page.html'):
    if os.path.exists(cache_file) and os.path.getsize(cache_file):
        with open(cache_file) as cache:
            return cache.read()

    with request.urlopen(url) as webpage:
        html_doc = webpage.read()
    with open(cache_file, 'wb') as cache:
        cache.write(html_doc)
    return html_doc

Context

StackExchange Code Review Q#74791, answer score: 3

Revisions (0)

No revisions yet.