snippetpythonMinor
Using a monkey-patched XML parser to convert journal articles to ePub format
Viewed 0 times
formatpatchedjournalparserconvertmonkeyxmlarticlesepubusing
Problem
I am working on a project that involves XML parsing, and for the job I am using xml.dom.minidom. During development I identified several patterns of processing that I refactored into discrete methods. The code shown in the snippet below shows my definition of an
Because I wanted to consolidate my refactored methods into a single definition — and because I felt they were best thought of as extended minidom methods, I removed them from the Interpretation/Output classes and then monkey-patched them into the minidom module so that they would be available to any class operating on the Article's document.
```
# -- coding: utf-8 --
import openaccess_epub.utils.element_methods as element_methods
import openaccess_epub.utils as utils
from openaccess_epub.jpts.jptsmetadata import JPTSMetaData20, JPTSMetaData23, JPTSMetaData30
import os.path
import sys
import shutil
import xml.dom.minidom as minidom
import logging
log = logging.getLogger('Article')
#Monkey patching in some extended methods for xml.dom.minidom classes
minidom.Node.removeSelf = element_methods.removeSelf
minidom.Node.replaceSelfWith = element_methods.replaceSelfWith
minidom.Node.elevateNode = element_methods.elevateNode
minidom.Element.getChildrenByTagName = element_methods.getChildrenByTagName
minidom.Element.removeAllAttributes = element_methods.removeAllAttributes
minidom.Element.getAllAttributes = element_methods.getAllAttributes
minidom.Element.getOptionalChild = element_methods.getOptionalChild
class Article(object):
"""
A journal article; the top-level element (document element) of the
Journal Publishing DTD, which contains all the metadata and content for
the article.
3.0 Tagset:
http://dtd.nlm.nih.gov/publishing/tag-library/3.0/n-3q20.html
2.0 Tagset:
http://dtd.nlm.nih.gov/publishing/tag-library/2.0/n-9kc0.html
2.3 Tagset:
http
Article class that is instantiated during initial processing and is later passed to other classes for interpretation and output.Because I wanted to consolidate my refactored methods into a single definition — and because I felt they were best thought of as extended minidom methods, I removed them from the Interpretation/Output classes and then monkey-patched them into the minidom module so that they would be available to any class operating on the Article's document.
```
# -- coding: utf-8 --
import openaccess_epub.utils.element_methods as element_methods
import openaccess_epub.utils as utils
from openaccess_epub.jpts.jptsmetadata import JPTSMetaData20, JPTSMetaData23, JPTSMetaData30
import os.path
import sys
import shutil
import xml.dom.minidom as minidom
import logging
log = logging.getLogger('Article')
#Monkey patching in some extended methods for xml.dom.minidom classes
minidom.Node.removeSelf = element_methods.removeSelf
minidom.Node.replaceSelfWith = element_methods.replaceSelfWith
minidom.Node.elevateNode = element_methods.elevateNode
minidom.Element.getChildrenByTagName = element_methods.getChildrenByTagName
minidom.Element.removeAllAttributes = element_methods.removeAllAttributes
minidom.Element.getAllAttributes = element_methods.getAllAttributes
minidom.Element.getOptionalChild = element_methods.getOptionalChild
class Article(object):
"""
A journal article; the top-level element (document element) of the
Journal Publishing DTD, which contains all the metadata and content for
the article.
3.0 Tagset:
http://dtd.nlm.nih.gov/publishing/tag-library/3.0/n-3q20.html
2.0 Tagset:
http://dtd.nlm.nih.gov/publishing/tag-library/2.0/n-9kc0.html
2.3 Tagset:
http
Solution
The problem here is the minidom API is a well known API. Someone new to the code needs to know that you monkey patched it and why you did it. Otherwise they would be scouring the minidom docs looking for your methods. This is generally why monkey patching is a bad idea because it can be confusing to the next reader. Especially when the next reader is a person less experienced with the API or programming language.
As @fge suggests, using composition in some way would be preferable here.
As @fge suggests, using composition in some way would be preferable here.
Context
StackExchange Code Review Q#26859, answer score: 3
Revisions (0)
No revisions yet.