HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Dirty XML-to-JSON parser

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
jsonparserxmldirty

Problem

I have that XML-to-JSON function based on ElementTree. It looks very simple but until now it does what it's supposed to do: give a JSON description of the document's ElementTree.

import xml.etree.ElementTree as ET

def dirtyParser(node):
    '''dirty xml parser
    parses tag, attributes, text, children
    recursive
    returns a nested dict'''

    # mapping with recursive call
    res = {'tag':node.tag,
           'attributes': node.attrib,
           'text': node.text,
           'children': [dirtyParser(c) for c in node.getchildren()]}

    # remove blanks and empties
    for k, v in res.items():
        if v in ['', '\n', [], {}, None]:
            res.pop(k, None)

    return res


Usage:

>>> some_xml = ET.fromstring(u'Maldonado, Gavin G.Veda Parks')
>>> dirtyParser(some_xml)
>>> {'tag': 'records', 'children': [{'tag': 'record', 'children': [{'tag': 'him', 'text': 'Maldonado, Gavin G.'}, {'tag': 'her', 'text': 'Veda Parks'}]}]}


Is it really that reliable?

Solution

It's probably not reliable except if your XML data is simple.

  • XML is tricky!



  • You forgot the .tail attribute, which contains any text after a given attribute.



  • Whitespace is significant, so you won't be able to go back to the same XMl document.



  • And everything else I don't know about.



  • The way Python represents dictionary is different from JSON. For example, JSON only allows " for quoting, not '. You can use json.dumps to solve this problem.



-
More obviously, if you were representing this data using JSON, your data would look like:

"records": [
    {"him": "Maldonado, Gavin G.",
     "her": "Veda Parks"}
]


or something like that. That's very different from what you're outputting, so your progrem does not really represent your data using JSON, but represents the XML representing your data using JSON. But converting to "real JSON" is much more difficult except for some very specific XML, and would not be useful as a general purpose converter.

This program may be useful to you in some specific scenarios, but you'd better explicitly state what kind of data you accept and reject anything else. Also, what's the point of this?

Code Snippets

"records": [
    {"him": "Maldonado, Gavin G.",
     "her": "Veda Parks"}
]

Context

StackExchange Code Review Q#63530, answer score: 3

Revisions (0)

No revisions yet.