patternpythonMinor
Small program to download wikipedia articles to pdf
Viewed 0 times
programwikipediaarticlessmalldownloadpdf
Problem
I made a small app to download wikipedia articles (and optionally those that it links to) as PDFs to take on the go. I'd eventually like to do a text-to-speech option and save the article as an mp3, though that's the next step. Any thoughts would be helpful.
import wikipedia
import pdfkit
import os
from gtts import gTTS
class Page:
def __init__(self):
"""Set default pdfkit options"""
self.pdfOptions = {
'page-size': 'Letter',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
'javascript-delay' : 2000,
'minimum-font-size': 512
}
self.targetDir = os.path.dirname(os.path.realpath(__file__))
self.includeLinks = False
def getArticle(self, articleTitle):
"""fetch the article from wiki by title"""
self.page = wikipedia.page(articleTitle)
try:
self.page.summary
except wikipedia.exceptions.DisambiguationError as e:
print "Multiple articles with that name: " + e.options
def setURL(self,URL):
"""fetch the article by URL"""
pass
def download(self):
"""download the article (and maybe the articles it links to"""
if self.includeLinks == False:
filename = self.targetDir+"/"+self.page.title+'.pdf'
pdfkit.from_url(self.page.url, filename, options = self.pdfOptions)
else:
for link in self.page.links:
linkedPage = wikipedia.page(link)
print "Downloading " + linkedPage.url
filename = self.targetDir+"/"+linkedPage.title+'.pdf'
pdfkit.from_url(linkedPage.url, filename, options=self.pdfOptions)
def speak(self):
passSolution
Here are some stylistic and code style points:
-
organize your imports as per PEP8 guidelines - first, the system-level imports, then third-parties, then your "local" imports. Also, remove unused
-
docstrings should start with a capital letter and end with a dot
-
I think you can handle both cases in your
- fix the variable naming - in Python, there is a
lower_case_with_underscoresvariable naming style (PEP8 reference)
-
organize your imports as per PEP8 guidelines - first, the system-level imports, then third-parties, then your "local" imports. Also, remove unused
from gtts import gTTS import:import os
import pdfkit
import wikipedia-
docstrings should start with a capital letter and end with a dot
- I think you should define
pdfOptionsas a module-level (or configuration-layer-level) constant instead of having them defined as an instance variable
if self.includeLinks == False:can be simplified toif not self.includeLinks:
- you should have spaces around the operators inside expressions (reference)
-
I think you can handle both cases in your
download() method in a unified way:def download(self):
links = self.page.links if self.includeLinks else [self.page]
for link in links:
page = wikipedia.page(link)
print("Downloading " + page.url)
filename = self.targetDir + "/" + page.title + '.pdf'
pdfkit.from_url(page.url, filename, options=self.pdfOptions)- I would also use
os.path.join()instead of string-concatenating filename paths
Code Snippets
import os
import pdfkit
import wikipediadef download(self):
links = self.page.links if self.includeLinks else [self.page]
for link in links:
page = wikipedia.page(link)
print("Downloading " + page.url)
filename = self.targetDir + "/" + page.title + '.pdf'
pdfkit.from_url(page.url, filename, options=self.pdfOptions)Context
StackExchange Code Review Q#160784, answer score: 4
Revisions (0)
No revisions yet.