patternpythonMinor
Extending the functionality of lxml.etree
Viewed 0 times
etreethelxmlfunctionalityextending
Problem
I wrote a class to slightly customize the behavior of
The function
lxml.etree.ElementTree and I use it quite extensively. It works great, but there are a few methods that I'm not sure how I wrote, and there are a few other methods that seem awfully redundant. I'll elaborate on specific questions below, but first here's the code.import lxml.etree as old_etree
from xml_utilities import clean_xml
from datetime import datetime
VERSION = 'X.X.X.X'
TOOL = 'ExampleTool'
class etree(old_etree._ElementTree):
@staticmethod
def parse(path):
old_tdx = old_etree.parse(path)
new_tdx = etree(old_tdx)
new_tdx._setroot(old_tdx.getroot())
return new_tdx
@staticmethod
def fromstring(string):
old_tdx = old_etree.fromstring(string)
new_tdx = etree(old_tdx.getroottree())
new_tdx._setroot(old_tdx)
return new_tdx
@staticmethod
def getISOTime():
iso_time = datetime.now().isoformat()
format_time = iso_time.split('.')[0] + 'Z'
return format_time
@staticmethod
def SubElement(*args, **kargs):
return old_etree.SubElement(*args, **kargs)
@staticmethod
def Element(*args, **kargs):
return old_etree.Element(*args, **kargs)
def getMetaNode(self, name, default=''):
metas = self.findall('meta')
for meta in metas:
if meta.get('name') == name:
return meta
return old_etree.SubElement(self.getroot(), 'meta',
attrib={'name': name, 'value': default})
def setMetaNode(self, name, value):
node = self.getMetaNode(name)
node.set('value', value)
def write(self, path, updateTool=True):
self.setMetaNode('saved', etree.getISOTime())
if updateTool:
self.setMetaNode('version', VERSION)
self.setMetaNode('type', TOOL)
super(etree, self).write(path)
clean_xml(path)The function
clean_xml is custom written and largely irrelevant: It escapes troublesome characters that ElementTree.write doesn't by default (this is an issue with thSolution
Boilerplate
In those boilerplate functions, I don't know what "tdx" means. I guess it's linked to the file specification you use.
Those functions are the main problem I have with your code. You mixed the
This would make things way less confusing because it would be closer to how
You would do
I don't have much to say about the implementation of the functions apart from the return type: ugly but I did not find a way to make things better. And yes, it's quite easy to understand them.
You no longer would need do to this with my proposal above.
getISOTime
Adding the 'Z' means you're saying this is an UTC time, but it's not since you've used
getMetaNode
You can use lxml XPath support here if you want to avoid the loop:
Write
nitpick: Cleaning at the ElementTree would prevent your filesystem to see two different versions. Say at some point you decide to use watchdog, the callback will kick in before you have a chance to run
Answering your questions
Since lxml is written in Cython, I think you can't monkeypatch lxml directly anyway, so subclassing is the way to go. lxml could have chosen to make things easier by using
See "Boilerplate" above.
Why do you want to get an ElementTree rather than an Element? I think it's a bad idea for two reasons:
If anything, I think I would use more than two lines. If you want to stop worrying about such things, I highly recommand yapf.
Originally, I tried to just overwrite
class etree(old_etree._ElementTree):
@staticmethod
def parse(path):
old_tdx = old_etree.parse(path)
new_tdx = etree(old_tdx)
new_tdx._setroot(old_tdx.getroot())
return new_tdx
@staticmethod
def fromstring(string):
old_tdx = old_etree.fromstring(string)
new_tdx = etree(old_tdx.getroottree())
new_tdx._setroot(old_tdx)
return new_tdxIn those boilerplate functions, I don't know what "tdx" means. I guess it's linked to the file specification you use.
Those functions are the main problem I have with your code. You mixed the
lxml.etree module with the lxml.etree.ElementTree class. Those four methods are methods of the module, not the class! I think you should:- have a real
etreemodule
- put your current
etreeclass (renamed asElementTree) in it
- add those functions in the module, not the class
- fix the return type to return what lxml returns
This would make things way less confusing because it would be closer to how
lxml.etree and the standard xml.etree work. Your coworkers would be able to pick your code faster.You would do
from lxml.etree import * and only redefine the functions you want to redefine. Your probably can't expect a perfect compatibility, but at least the basic API would be the same.I don't have much to say about the implementation of the functions apart from the return type: ugly but I did not find a way to make things better. And yes, it's quite easy to understand them.
@staticmethod
def SubElement(*args, **kargs):
return old_etree.SubElement(*args, **kargs)
@staticmethod
def Element(*args, **kargs):
return old_etree.Element(*args, **kargs)You no longer would need do to this with my proposal above.
getISOTime
@staticmethod
def getISOTime():
iso_time = datetime.now().isoformat()
format_time = iso_time.split('.')[0] + 'Z'
return format_timeAdding the 'Z' means you're saying this is an UTC time, but it's not since you've used
datetime.now() instead of datetime.utcnow(). I live in UTC+4, and the format_time you're returning is 4 hours off for me.getMetaNode
def getMetaNode(self, name, default=''):
metas = self.findall('meta')
for meta in metas:
if meta.get('name') == name:
return metaYou can use lxml XPath support here if you want to avoid the loop:
meta.xpath("meta[name = $name]", name=name).Write
def write(self, path, updateTool=True):
self.setMetaNode('saved', etree.getISOTime())
if updateTool:
self.setMetaNode('version', VERSION)
self.setMetaNode('type', TOOL)
super(etree, self).write(path)
clean_xml(path)nitpick: Cleaning at the ElementTree would prevent your filesystem to see two different versions. Say at some point you decide to use watchdog, the callback will kick in before you have a chance to run
clean_xml, which could cause subtle bugs.Answering your questions
- Is there a better way to extend the functionality of
lxml.etreethan inheriting from_ElementTree? Ultimately, I want to be changing the behavior of the methodElementTree.writeand to add the methodsElementTree.getMetaNodeandElementTree.setMetaNode.
Since lxml is written in Cython, I think you can't monkeypatch lxml directly anyway, so subclassing is the way to go. lxml could have chosen to make things easier by using
__new__ just like numpy does for facilitating ndarray subclassing.- Is there a better way of incorporating the methods
parse,fromstring,SubElement, andElement, rather than just creating thin wrappers that reference the original methods?
See "Boilerplate" above.
- Does anyone understand what my methods
parseandfromstringare doing? I know that my methodfromstringhas different functionality fromlxml.etree.fromstringbecause I want it to return anElementTreerather than anElement. These methods were written largely through guess-and-check (oops).
Why do you want to get an ElementTree rather than an Element? I think it's a bad idea for two reasons:
- It's very easy to get an ElementTree from an Element.
- It's already hard enough to reason about the type of object you get when using lxml, if we can't use our knowledge about ElementTree/lxml it's only going to get worse.
- Is there a better way to write the
return old_etree.SubElement(...)line so it doesn't have to take up two lines? This isn't major, but...
If anything, I think I would use more than two lines. If you want to stop worrying about such things, I highly recommand yapf.
Originally, I tried to just overwrite
lxml.etree.ElementTree.write directly, but that throws the error AttributeError: 'lxml.etree._ElementTree' object attribute 'write' is read-only. General critiques are also welcomed. I know that I should have comments in here, but everything is pretty straight-forward except for the two methods that I don't undersCode Snippets
class etree(old_etree._ElementTree):
@staticmethod
def parse(path):
old_tdx = old_etree.parse(path)
new_tdx = etree(old_tdx)
new_tdx._setroot(old_tdx.getroot())
return new_tdx
@staticmethod
def fromstring(string):
old_tdx = old_etree.fromstring(string)
new_tdx = etree(old_tdx.getroottree())
new_tdx._setroot(old_tdx)
return new_tdx@staticmethod
def SubElement(*args, **kargs):
return old_etree.SubElement(*args, **kargs)
@staticmethod
def Element(*args, **kargs):
return old_etree.Element(*args, **kargs)@staticmethod
def getISOTime():
iso_time = datetime.now().isoformat()
format_time = iso_time.split('.')[0] + 'Z'
return format_timedef getMetaNode(self, name, default=''):
metas = self.findall('meta')
for meta in metas:
if meta.get('name') == name:
return metadef write(self, path, updateTool=True):
self.setMetaNode('saved', etree.getISOTime())
if updateTool:
self.setMetaNode('version', VERSION)
self.setMetaNode('type', TOOL)
super(etree, self).write(path)
clean_xml(path)Context
StackExchange Code Review Q#119559, answer score: 4
Revisions (0)
No revisions yet.