HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Using a monkey-patched XML parser to convert journal articles to ePub format

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
formatpatchedjournalparserconvertmonkeyxmlarticlesepubusing

Problem

I am working on a project that involves XML parsing, and for the job I am using xml.dom.minidom. During development I identified several patterns of processing that I refactored into discrete methods. The code shown in the snippet below shows my definition of an Article class that is instantiated during initial processing and is later passed to other classes for interpretation and output.

Because I wanted to consolidate my refactored methods into a single definition — and because I felt they were best thought of as extended minidom methods, I removed them from the Interpretation/Output classes and then monkey-patched them into the minidom module so that they would be available to any class operating on the Article's document.

```
# -- coding: utf-8 --
import openaccess_epub.utils.element_methods as element_methods
import openaccess_epub.utils as utils
from openaccess_epub.jpts.jptsmetadata import JPTSMetaData20, JPTSMetaData23, JPTSMetaData30
import os.path
import sys
import shutil
import xml.dom.minidom as minidom
import logging

log = logging.getLogger('Article')

#Monkey patching in some extended methods for xml.dom.minidom classes
minidom.Node.removeSelf = element_methods.removeSelf
minidom.Node.replaceSelfWith = element_methods.replaceSelfWith
minidom.Node.elevateNode = element_methods.elevateNode
minidom.Element.getChildrenByTagName = element_methods.getChildrenByTagName
minidom.Element.removeAllAttributes = element_methods.removeAllAttributes
minidom.Element.getAllAttributes = element_methods.getAllAttributes
minidom.Element.getOptionalChild = element_methods.getOptionalChild

class Article(object):
"""
A journal article; the top-level element (document element) of the
Journal Publishing DTD, which contains all the metadata and content for
the article.
3.0 Tagset:
http://dtd.nlm.nih.gov/publishing/tag-library/3.0/n-3q20.html
2.0 Tagset:
http://dtd.nlm.nih.gov/publishing/tag-library/2.0/n-9kc0.html
2.3 Tagset:
http

Solution

The problem here is the minidom API is a well known API. Someone new to the code needs to know that you monkey patched it and why you did it. Otherwise they would be scouring the minidom docs looking for your methods. This is generally why monkey patching is a bad idea because it can be confusing to the next reader. Especially when the next reader is a person less experienced with the API or programming language.

As @fge suggests, using composition in some way would be preferable here.

Context

StackExchange Code Review Q#26859, answer score: 3

Revisions (0)

No revisions yet.