HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Script for saving top wallpapers from wallbase.cc into a directory

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptdirectorytopwallbasewallpapersintoforfromsaving

Problem

I am newish to Python and I am looking for some extra eyes on a script I wrote. I wrote this script to learn more about web-scraping and using requests and Beautiful Soup. I use it to connect to the website wallbase.cc and save the top 32 wallpapers from the last 3 days into a directory where I later cycle through displaying them on my desktop.

I'm really looking for some best practices suggestions, whatever you think is appropriate. I'm open to any constructive criticism.

```
#!/usr/bin/env python

import os
import requests
import sys
import urllib.request
from bs4 import BeautifulSoup
import hashlib

def get_pic(number_imgs, timespan):
"""Returns a list of the specified number of top images over the specified
timeframe from wallbase.cc
"""
url = "http://wallbase.cc/toplist"
opts = {
'section':'wallpapers', 'q':'', 'res_opt':'gteq',
'res':'1920x1080', 'aspect':'1.77', 'purity':'100',
'board':'21', 'thpp':number_imgs, 'ts':timespan
}
htmltext = requests.get(url, params = opts)
page_urls = []
img_urls = []
soup = BeautifulSoup(htmltext.content)
results = soup.findAll("a")

for r in results:
if "http://wallbase.cc/wallpaper/" in r['href']:
page_urls.append(r['href'])

for p in page_urls:
wp_page = requests.get(p)
wp_soup = BeautifulSoup(wp_page.content)
wp_results = wp_soup.findAll("img")
for res in wp_results:
if "http://wallpapers.wallbase.cc/" in res['src']:
img_urls.append(res['src'])

return img_urls

def save_pic(url):
"""Saves a file to disk when given a URL"""
save_path = sys.argv[1]
hs = hashlib.md5(url.encode('UTF-8')).hexdigest()
file_ext = url.split(".")[-1]
to_save = (save_path + hs + "." + file_ext)
if to_save != "":
if os.path.isfile(to_save):
print(hs + "." + str(file_ext) + "\texists, skipping...")
else:
print(hs + "." + str(f

Solution

Your code is nice, well documented, properly formatted and easy to understand so I won't have to much to say.

  • You can have a read at PEP 8 as your code is not exactly perfect for details like spacing (you should have whitespaces after : in your dicts, you shouldn't have trailing whitespaces at the end of the line, etc). You'll find cool tools to check stuff for you automatically : pep8, pylint, pyflakes, pychecker.



  • In save_pic, there is no way to_save != "" can be false. Indeed, because of the way you build to_save, I know for sure that it will contain a "." making it a non-empty string. If you want to be sure that the property keeps being True, you can use assert like this : assert to_save != "". Otherwise, you can just get rid of the test.



  • The fact that you are calling str() on file_ext in save_pic is a bit confusing : if this wasn't to behave like a string in the first place, I assume to_save = (save_path + hs + "." + file_ext) would fail.



  • save_pic as a function probably shouldn't rely on sys.argv[1] : I think it would be much better if this information was to be provided as a parameter. Also, to join filename, you can use os.path.join().



  • in save_pic (again), I think you should store the value of hs + "." + file_ext in a variable instead of repeating this in 3 different places. Also, you do not need parenthesis when assigning to to_save.



  • in get_pic, you could use list comprehension to define page_urls : page_urls = [r['href'] for r in soup.findAll("a") if "http://wallbase.cc/wallpaper/" in r['href']].

Context

StackExchange Code Review Q#55038, answer score: 2

Revisions (0)

No revisions yet.