HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Getting to Wikipedia's "Philosophy" article using Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
wikipediagettingusingpythonphilosophyarticle

Problem

On Wikipedia, if you click the first non-italicised internal link in the main text of an article that's not within parentheses, and then repeat the process, you usually end up on the "Philosophy" article (see this Wikipedia essay).

To test this idea, I made a simple Python module that does the "clicking" programmatically. Here's the code:

``
"""
The Philosophy Game
~~~~~~~~~~~~~~~~~~~~~~~~~

Clicking on the first non-parenthesized, non-italicized link,
in the main text of a Wikipedia article, and then repeating
the process for subsequent articles, usually eventually gets
one to the Philosophy article. (See
https://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy
for more information)

The Philosophy Game, written in Python, lets you do the clicking
programmatically.

Basic usage:

>>> from philosophy import PhilosophyGame
>>> game = PhilosophyGame('Python (programming language)')
>>> for s in game.trace():
... print(s)
...
>>>

Handling errors:
>>> from philosophy import *
>>> game = PhilosophyGame('Python (programming language)')
>>> try:
... for s in game.trace():
... print(s)
... except ConnectionError:
... sys.exit('Network error, please check your connection')
... except MediaWikiError as e:
... sys.exit('MediaWiki API error {1}: {2}'.format(e.errors['code'],
... e.errors['info']))
... except LoopException:
... sys.exit('Loop detected, exiting...')
... except InvalidPageNameError as e:
... sys.exit(e)
... except LinkNotFoundError as e:
... sys.exit(e)

Advanced options:

In this example, we set
end to 'Multicellular organism', so that
instead of stopping at 'Philosophy', trace() stops there.
>>> game = PhilosophyGame(page='Sandwich', end='Multicellular organism'):

In the following example, we set
dont_stop to True, so that
trace() disregards the value of
end` and doesn't sto

Solution

Reduce repetition

@staticmethod
def valid_page_name(page):
    """
    Checks for valid mainspace Wikipedia page name
    """
    return (page.find('File:') == -1
        and page.find('File talk') == -1
        and page.find('Wikipedia:') == -1
        and page.find('Wikipedia talk:') == -1
        and page.find('Project:') == -1
        and page.find('Project talk:') == -1
        and page.find('Portal:') == -1
        and page.find('Portal talk:') == -1
        and page.find('Special:') == -1
        and page.find('Help:') == -1
        and page.find('Help talk:') == -1
        and page.find('Template:') == -1
        and page.find('Template talk:') == -1
        and page.find('Talk:') == -1
        and page.find('Category:') == -1
        and page.find('Category talk:') == -1)


and page.find and == -1 are repeated \$16\$ times. Use a generator comprehension instead:

return all(page.find(non_main) == -1 for non_main in NON_MAIN_CATEGORIES)


Where NON_MAIN_CATEGORIES may be saved as a constant either top-level or inside this class.

in

Your usage of find looks like a weird substitute for in, you probably mean:

non_main not in page


When you use .find == -1

So at last we get:

return all(non_main not in page for non_main in NON_MAIN_CATEGORIES)


Some REPL example usage of in to clear this up:

>>> "example".find("e")
0
>>> "example".find("x")
1
>>> "example".find("z")
-1
>>> "example".find("z") == -1
True
>>> not "z" in "example"
True
>>> ("example".find("z") == -1) == (not "z" in "example")
True
>>> "z" not in "example" # Just some syntactic sugar
True


Philosophy and string stripping: Separation of concerns

Why is strip_parentheses(string) a method of the philosophy game class? Maybe you need this functionality inside the game, but it is a minor detail.

Think re-use, would anyone think that stripping parenthesis is put inside a class for Philosophy Wikipedia surfing?

Just put it free-floating or inside a string_utils module that you may import.

Actually, why a class?

Ignoring __init__, that any class must have, valid_page_name that is trivial, and strip_parentheses that should really not be there, thePhilosophyGame class just contains one function.

When you have only one function in a class, you can simplify and avoid the class completely.

def philosophy_game(start=None, end='Philosophy', ...):
    # Implementation


The usage also a little becomes easier:

print(list(philosophy_game('Dog')))


Double negatives

I find double negatives needlessly confusing.

if not self.dont_stop and page == self.end:
    return


Requires some thinking, while:

if self.should_end and page == self.end


Reads in a fraction of a second.

Or you may just use a noun instead of dont in the variable name:

if not self.infinite and page == self.end:
    return


Either is easier to understand than double negation.

Code Snippets

@staticmethod
def valid_page_name(page):
    """
    Checks for valid mainspace Wikipedia page name
    """
    return (page.find('File:') == -1
        and page.find('File talk') == -1
        and page.find('Wikipedia:') == -1
        and page.find('Wikipedia talk:') == -1
        and page.find('Project:') == -1
        and page.find('Project talk:') == -1
        and page.find('Portal:') == -1
        and page.find('Portal talk:') == -1
        and page.find('Special:') == -1
        and page.find('Help:') == -1
        and page.find('Help talk:') == -1
        and page.find('Template:') == -1
        and page.find('Template talk:') == -1
        and page.find('Talk:') == -1
        and page.find('Category:') == -1
        and page.find('Category talk:') == -1)
return all(page.find(non_main) == -1 for non_main in NON_MAIN_CATEGORIES)
non_main not in page
return all(non_main not in page for non_main in NON_MAIN_CATEGORIES)
>>> "example".find("e")
0
>>> "example".find("x")
1
>>> "example".find("z")
-1
>>> "example".find("z") == -1
True
>>> not "z" in "example"
True
>>> ("example".find("z") == -1) == (not "z" in "example")
True
>>> "z" not in "example" # Just some syntactic sugar
True

Context

StackExchange Code Review Q#114537, answer score: 5

Revisions (0)

No revisions yet.