patternpythonMinor
Getting to Wikipedia's "Philosophy" article using Python
Viewed 0 times
wikipediagettingusingpythonphilosophyarticle
Problem
On Wikipedia, if you click the first non-italicised internal link in the main text of an article that's not within parentheses, and then repeat the process, you usually end up on the "Philosophy" article (see this Wikipedia essay).
To test this idea, I made a simple Python module that does the "clicking" programmatically. Here's the code:
``
To test this idea, I made a simple Python module that does the "clicking" programmatically. Here's the code:
``
"""
The Philosophy Game
~~~~~~~~~~~~~~~~~~~~~~~~~
Clicking on the first non-parenthesized, non-italicized link,
in the main text of a Wikipedia article, and then repeating
the process for subsequent articles, usually eventually gets
one to the Philosophy article. (See
https://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy
for more information)
The Philosophy Game, written in Python, lets you do the clicking
programmatically.
Basic usage:
>>> from philosophy import PhilosophyGame
>>> game = PhilosophyGame('Python (programming language)')
>>> for s in game.trace():
... print(s)
...
>>>
Handling errors:
>>> from philosophy import *
>>> game = PhilosophyGame('Python (programming language)')
>>> try:
... for s in game.trace():
... print(s)
... except ConnectionError:
... sys.exit('Network error, please check your connection')
... except MediaWikiError as e:
... sys.exit('MediaWiki API error {1}: {2}'.format(e.errors['code'],
... e.errors['info']))
... except LoopException:
... sys.exit('Loop detected, exiting...')
... except InvalidPageNameError as e:
... sys.exit(e)
... except LinkNotFoundError as e:
... sys.exit(e)
Advanced options:
In this example, we set end to 'Multicellular organism', so that
instead of stopping at 'Philosophy', trace() stops there.
>>> game = PhilosophyGame(page='Sandwich', end='Multicellular organism'):
In the following example, we set dont_stop to True, so that
trace() disregards the value of end` and doesn't stoSolution
Reduce repetition
Where
Your usage of
When you use
So at last we get:
Some REPL example usage of
Philosophy and string stripping: Separation of concerns
Why is
Think re-use, would anyone think that stripping parenthesis is put inside a class for Philosophy Wikipedia surfing?
Just put it free-floating or inside a
Actually, why a class?
Ignoring
When you have only one function in a class, you can simplify and avoid the class completely.
The usage also a little becomes easier:
Double negatives
I find double negatives needlessly confusing.
Requires some thinking, while:
Reads in a fraction of a second.
Or you may just use a noun instead of
Either is easier to understand than double negation.
@staticmethod
def valid_page_name(page):
"""
Checks for valid mainspace Wikipedia page name
"""
return (page.find('File:') == -1
and page.find('File talk') == -1
and page.find('Wikipedia:') == -1
and page.find('Wikipedia talk:') == -1
and page.find('Project:') == -1
and page.find('Project talk:') == -1
and page.find('Portal:') == -1
and page.find('Portal talk:') == -1
and page.find('Special:') == -1
and page.find('Help:') == -1
and page.find('Help talk:') == -1
and page.find('Template:') == -1
and page.find('Template talk:') == -1
and page.find('Talk:') == -1
and page.find('Category:') == -1
and page.find('Category talk:') == -1)and page.find and == -1 are repeated \$16\$ times. Use a generator comprehension instead:return all(page.find(non_main) == -1 for non_main in NON_MAIN_CATEGORIES)Where
NON_MAIN_CATEGORIES may be saved as a constant either top-level or inside this class.inYour usage of
find looks like a weird substitute for in, you probably mean:non_main not in pageWhen you use
.find == -1So at last we get:
return all(non_main not in page for non_main in NON_MAIN_CATEGORIES)Some REPL example usage of
in to clear this up:>>> "example".find("e")
0
>>> "example".find("x")
1
>>> "example".find("z")
-1
>>> "example".find("z") == -1
True
>>> not "z" in "example"
True
>>> ("example".find("z") == -1) == (not "z" in "example")
True
>>> "z" not in "example" # Just some syntactic sugar
TruePhilosophy and string stripping: Separation of concerns
Why is
strip_parentheses(string) a method of the philosophy game class? Maybe you need this functionality inside the game, but it is a minor detail.Think re-use, would anyone think that stripping parenthesis is put inside a class for Philosophy Wikipedia surfing?
Just put it free-floating or inside a
string_utils module that you may import.Actually, why a class?
Ignoring
__init__, that any class must have, valid_page_name that is trivial, and strip_parentheses that should really not be there, thePhilosophyGame class just contains one function.When you have only one function in a class, you can simplify and avoid the class completely.
def philosophy_game(start=None, end='Philosophy', ...):
# ImplementationThe usage also a little becomes easier:
print(list(philosophy_game('Dog')))Double negatives
I find double negatives needlessly confusing.
if not self.dont_stop and page == self.end:
returnRequires some thinking, while:
if self.should_end and page == self.endReads in a fraction of a second.
Or you may just use a noun instead of
dont in the variable name:if not self.infinite and page == self.end:
returnEither is easier to understand than double negation.
Code Snippets
@staticmethod
def valid_page_name(page):
"""
Checks for valid mainspace Wikipedia page name
"""
return (page.find('File:') == -1
and page.find('File talk') == -1
and page.find('Wikipedia:') == -1
and page.find('Wikipedia talk:') == -1
and page.find('Project:') == -1
and page.find('Project talk:') == -1
and page.find('Portal:') == -1
and page.find('Portal talk:') == -1
and page.find('Special:') == -1
and page.find('Help:') == -1
and page.find('Help talk:') == -1
and page.find('Template:') == -1
and page.find('Template talk:') == -1
and page.find('Talk:') == -1
and page.find('Category:') == -1
and page.find('Category talk:') == -1)return all(page.find(non_main) == -1 for non_main in NON_MAIN_CATEGORIES)non_main not in pagereturn all(non_main not in page for non_main in NON_MAIN_CATEGORIES)>>> "example".find("e")
0
>>> "example".find("x")
1
>>> "example".find("z")
-1
>>> "example".find("z") == -1
True
>>> not "z" in "example"
True
>>> ("example".find("z") == -1) == (not "z" in "example")
True
>>> "z" not in "example" # Just some syntactic sugar
TrueContext
StackExchange Code Review Q#114537, answer score: 5
Revisions (0)
No revisions yet.