patternpythonMinor
Script for saving top wallpapers from wallbase.cc into a directory
Viewed 0 times
scriptdirectorytopwallbasewallpapersintoforfromsaving
Problem
I am newish to Python and I am looking for some extra eyes on a script I wrote. I wrote this script to learn more about web-scraping and using requests and Beautiful Soup. I use it to connect to the website wallbase.cc and save the top 32 wallpapers from the last 3 days into a directory where I later cycle through displaying them on my desktop.
I'm really looking for some best practices suggestions, whatever you think is appropriate. I'm open to any constructive criticism.
```
#!/usr/bin/env python
import os
import requests
import sys
import urllib.request
from bs4 import BeautifulSoup
import hashlib
def get_pic(number_imgs, timespan):
"""Returns a list of the specified number of top images over the specified
timeframe from wallbase.cc
"""
url = "http://wallbase.cc/toplist"
opts = {
'section':'wallpapers', 'q':'', 'res_opt':'gteq',
'res':'1920x1080', 'aspect':'1.77', 'purity':'100',
'board':'21', 'thpp':number_imgs, 'ts':timespan
}
htmltext = requests.get(url, params = opts)
page_urls = []
img_urls = []
soup = BeautifulSoup(htmltext.content)
results = soup.findAll("a")
for r in results:
if "http://wallbase.cc/wallpaper/" in r['href']:
page_urls.append(r['href'])
for p in page_urls:
wp_page = requests.get(p)
wp_soup = BeautifulSoup(wp_page.content)
wp_results = wp_soup.findAll("img")
for res in wp_results:
if "http://wallpapers.wallbase.cc/" in res['src']:
img_urls.append(res['src'])
return img_urls
def save_pic(url):
"""Saves a file to disk when given a URL"""
save_path = sys.argv[1]
hs = hashlib.md5(url.encode('UTF-8')).hexdigest()
file_ext = url.split(".")[-1]
to_save = (save_path + hs + "." + file_ext)
if to_save != "":
if os.path.isfile(to_save):
print(hs + "." + str(file_ext) + "\texists, skipping...")
else:
print(hs + "." + str(f
I'm really looking for some best practices suggestions, whatever you think is appropriate. I'm open to any constructive criticism.
```
#!/usr/bin/env python
import os
import requests
import sys
import urllib.request
from bs4 import BeautifulSoup
import hashlib
def get_pic(number_imgs, timespan):
"""Returns a list of the specified number of top images over the specified
timeframe from wallbase.cc
"""
url = "http://wallbase.cc/toplist"
opts = {
'section':'wallpapers', 'q':'', 'res_opt':'gteq',
'res':'1920x1080', 'aspect':'1.77', 'purity':'100',
'board':'21', 'thpp':number_imgs, 'ts':timespan
}
htmltext = requests.get(url, params = opts)
page_urls = []
img_urls = []
soup = BeautifulSoup(htmltext.content)
results = soup.findAll("a")
for r in results:
if "http://wallbase.cc/wallpaper/" in r['href']:
page_urls.append(r['href'])
for p in page_urls:
wp_page = requests.get(p)
wp_soup = BeautifulSoup(wp_page.content)
wp_results = wp_soup.findAll("img")
for res in wp_results:
if "http://wallpapers.wallbase.cc/" in res['src']:
img_urls.append(res['src'])
return img_urls
def save_pic(url):
"""Saves a file to disk when given a URL"""
save_path = sys.argv[1]
hs = hashlib.md5(url.encode('UTF-8')).hexdigest()
file_ext = url.split(".")[-1]
to_save = (save_path + hs + "." + file_ext)
if to_save != "":
if os.path.isfile(to_save):
print(hs + "." + str(file_ext) + "\texists, skipping...")
else:
print(hs + "." + str(f
Solution
Your code is nice, well documented, properly formatted and easy to understand so I won't have to much to say.
- You can have a read at PEP 8 as your code is not exactly perfect for details like spacing (you should have whitespaces after
:in your dicts, you shouldn't have trailing whitespaces at the end of the line, etc). You'll find cool tools to check stuff for you automatically :pep8,pylint,pyflakes,pychecker.
- In
save_pic, there is no wayto_save != ""can be false. Indeed, because of the way you buildto_save, I know for sure that it will contain a "." making it a non-empty string. If you want to be sure that the property keeps being True, you can useassertlike this :assert to_save != "". Otherwise, you can just get rid of the test.
- The fact that you are calling
str()onfile_extinsave_picis a bit confusing : if this wasn't to behave like a string in the first place, I assumeto_save = (save_path + hs + "." + file_ext)would fail.
save_picas a function probably shouldn't rely onsys.argv[1]: I think it would be much better if this information was to be provided as a parameter. Also, to join filename, you can use os.path.join().
- in
save_pic(again), I think you should store the value ofhs + "." + file_extin a variable instead of repeating this in 3 different places. Also, you do not need parenthesis when assigning toto_save.
- in
get_pic, you could use list comprehension to define page_urls :page_urls = [r['href'] for r in soup.findAll("a") if "http://wallbase.cc/wallpaper/" in r['href']].
Context
StackExchange Code Review Q#55038, answer score: 2
Revisions (0)
No revisions yet.