patternpythonMinor
Dirty XML-to-JSON parser
Viewed 0 times
jsonparserxmldirty
Problem
I have that XML-to-JSON function based on
Usage:
Is it really that reliable?
ElementTree. It looks very simple but until now it does what it's supposed to do: give a JSON description of the document's ElementTree.import xml.etree.ElementTree as ET
def dirtyParser(node):
'''dirty xml parser
parses tag, attributes, text, children
recursive
returns a nested dict'''
# mapping with recursive call
res = {'tag':node.tag,
'attributes': node.attrib,
'text': node.text,
'children': [dirtyParser(c) for c in node.getchildren()]}
# remove blanks and empties
for k, v in res.items():
if v in ['', '\n', [], {}, None]:
res.pop(k, None)
return resUsage:
>>> some_xml = ET.fromstring(u'Maldonado, Gavin G.Veda Parks')
>>> dirtyParser(some_xml)
>>> {'tag': 'records', 'children': [{'tag': 'record', 'children': [{'tag': 'him', 'text': 'Maldonado, Gavin G.'}, {'tag': 'her', 'text': 'Veda Parks'}]}]}Is it really that reliable?
Solution
It's probably not reliable except if your XML data is simple.
-
More obviously, if you were representing this data using JSON, your data would look like:
or something like that. That's very different from what you're outputting, so your progrem does not really represent your data using JSON, but represents the XML representing your data using JSON. But converting to "real JSON" is much more difficult except for some very specific XML, and would not be useful as a general purpose converter.
This program may be useful to you in some specific scenarios, but you'd better explicitly state what kind of data you accept and reject anything else. Also, what's the point of this?
- XML is tricky!
- You forgot the
.tailattribute, which contains any text after a given attribute.
- Whitespace is significant, so you won't be able to go back to the same XMl document.
- And everything else I don't know about.
- The way Python represents dictionary is different from JSON. For example, JSON only allows
"for quoting, not'. You can usejson.dumpsto solve this problem.
-
More obviously, if you were representing this data using JSON, your data would look like:
"records": [
{"him": "Maldonado, Gavin G.",
"her": "Veda Parks"}
]or something like that. That's very different from what you're outputting, so your progrem does not really represent your data using JSON, but represents the XML representing your data using JSON. But converting to "real JSON" is much more difficult except for some very specific XML, and would not be useful as a general purpose converter.
This program may be useful to you in some specific scenarios, but you'd better explicitly state what kind of data you accept and reject anything else. Also, what's the point of this?
Code Snippets
"records": [
{"him": "Maldonado, Gavin G.",
"her": "Veda Parks"}
]Context
StackExchange Code Review Q#63530, answer score: 3
Revisions (0)
No revisions yet.