patternpythonMinor
XML schema parser
Viewed 0 times
parserschemaxml
Problem
I've been working on a lightweight XML schema parser, and have what I think is a moderately clean solution (some parts helped out by previous questions I posted here) so far for obtaining all schema details, but would like any criticism at all that could help further improve this code.
Below I have supplied the schema class I wrote, and then an example schema.txt file that the schema class will open if run as main. The schema class calls under "main" can be modified if you want to get a better look at the schema data structure, and I have some accompanying functions I have written for the class to pull out specific details that I haven't put here because I still need to do some work on them.
schema.py:
`from lxml import etree
INDICATORS = ["all", "sequence", "choice"]
TYPES = ["simpleType", "complexType"]
class schema:
def __init__(self, schemafile):
if schemafile is None:
print "Error creating Schema: Invalid schema file used"
return
self.schema = self.create_schema(etree.parse(schemafile))
def create_schema(self, schema_data):
def getXSVal(element): #removes namespace
return element.tag.split('}')[-1]
def get_simple_type(element):
return {
"name": element.get("name"),
"restriction": element.getchildren()[0].attrib,
"elements": [ e.get("value") for e in element.getchildren()[0].getchildren() ]
}
def get_simple_content(element):
return {
"simpleContent": {
"extension": element.getchildren()[0].attrib,
"attributes": [ a.attrib for a in element.getchildren()[0].getchildren() ]
}
}
def get_elements(element):
if len(element.getchildren()) == 0:
return element.attrib
data = {}
ename = element.get("name")
tag = getXSVal(element)
if
Below I have supplied the schema class I wrote, and then an example schema.txt file that the schema class will open if run as main. The schema class calls under "main" can be modified if you want to get a better look at the schema data structure, and I have some accompanying functions I have written for the class to pull out specific details that I haven't put here because I still need to do some work on them.
schema.py:
`from lxml import etree
INDICATORS = ["all", "sequence", "choice"]
TYPES = ["simpleType", "complexType"]
class schema:
def __init__(self, schemafile):
if schemafile is None:
print "Error creating Schema: Invalid schema file used"
return
self.schema = self.create_schema(etree.parse(schemafile))
def create_schema(self, schema_data):
def getXSVal(element): #removes namespace
return element.tag.split('}')[-1]
def get_simple_type(element):
return {
"name": element.get("name"),
"restriction": element.getchildren()[0].attrib,
"elements": [ e.get("value") for e in element.getchildren()[0].getchildren() ]
}
def get_simple_content(element):
return {
"simpleContent": {
"extension": element.getchildren()[0].attrib,
"attributes": [ a.attrib for a in element.getchildren()[0].getchildren() ]
}
}
def get_elements(element):
if len(element.getchildren()) == 0:
return element.attrib
data = {}
ename = element.get("name")
tag = getXSVal(element)
if
Solution
from lxml import etree
INDICATORS = ["all", "sequence", "choice"]
TYPES = ["simpleType", "complexType"]
class schema:Python convention is to name classes using CamelCase.
def __init__(self, schemafile):
if schemafile is None:
print "Error creating Schema: Invalid schema file used"
returnUse exceptions report errors in python. Don't print problems to standard output and then try to continue. Nothing good will come of it. Actually, you don't even need to check for None, because it'll fail on the next line anyways.
self.schema = self.create_schema(etree.parse(schemafile))
def create_schema(self, schema_data):
def getXSVal(element): #removes namespace
return element.tag.split('}')[-1]Shouldn't you at least verify that the namespace was correct?
def get_simple_type(element):
return {
"name": element.get("name"),
"restriction": element.getchildren()[0].attrib,
"elements": [ e.get("value") for e in element.getchildren()[0].getchildren() ]
}It looks like you are using a dictionary like an object. Perhaps you should actually be creating a SimpleType object with these attributes.
def get_simple_content(element):
return {
"simpleContent": {
"extension": element.getchildren()[0].attrib,
"attributes": [ a.attrib for a in element.getchildren()[0].getchildren() ]
}
}
def get_elements(element):I've go no idea what this function is trying to do
if len(element.getchildren()) == 0:
return element.attrib
data = {}
ename = element.get("name")
tag = getXSVal(element)
if ename is None:It seems strange that you check for the name, but don't do anything with it
if tag == "simpleContent":
return get_simple_content(element)Its confusing the way you sometimes return something, other times you add into a dictionary.
elif tag in INDICATORS:
data["indicator"] = tag
elif tag in TYPES:
data["type"] = tag
else:
data["option"] = tag
else:
if tag == "simpleType":
return get_simple_type(element)
else:
data.update(element.attrib)I don't really follow what the theory for this condition is. I do see the same code showing up multiple times which makes me wonder if it can be refactored to be cleaner.
data["elements"] = []
data["attributes"] = []
children = element.getchildren()
for child in children:Combine the last two lines
if child.get("name") is not None:
data[getXSVal(child)+"s"].append(get_elements(child))
elif tag in INDICATORS and getXSVal(child) in INDICATORS:
data["elements"].append(get_elements(child))
else:
data.update(get_elements(child))
if len(data["elements"]) == 0:
del data["elements"]
if len(data["attributes"]) == 0:
del data["attributes"]Do you really want to do this? It seems to me that it'll make code harder to write that uses the data
return dataThese long function as inner functions smell bad. The suggest perhaps they should be in another class or something.
schema = {}
root = schema_data.getroot()
children = root.getchildren()
for child in children:
c_type = getXSVal(child)
if child.get("name") is not None and not c_type in schema:
schema[c_type] = []If the name is None, won't that cause the next line to have an error?
schema[c_type].append(get_elements(child))Instead use
schema.setdefault(c_type,[]).append(get_elements(child)) it'll take care adding the list the first time you append.return schema
def get_Types(self, t_name):Python convetion is lowercase_with_underscores for method names
types = []
for t in self.schema[t_name]:
types.append(t["name"])
return typesI'd use
return [t["name"] for t in self.schema[t_name]]def get_simpleTypes(self):
return self.get_Types("simpleType")
def get_complexTypes(self):
return self.get_Types("complexType")
if __name__ == '__main__':
fschema = open("schema.txt")I suggest using with to make sure it gets closed
schema = schema(fschema)
print schema.get_simpleTypes()
print schema.get_complexTypes()My overall problem with your approach is that you are converting the xml schema into a bunch of unstructured dictionaries. The result isn't going to be much easier to work then the original XML object
Code Snippets
from lxml import etree
INDICATORS = ["all", "sequence", "choice"]
TYPES = ["simpleType", "complexType"]
class schema:def __init__(self, schemafile):
if schemafile is None:
print "Error creating Schema: Invalid schema file used"
returnself.schema = self.create_schema(etree.parse(schemafile))
def create_schema(self, schema_data):
def getXSVal(element): #removes namespace
return element.tag.split('}')[-1]def get_simple_type(element):
return {
"name": element.get("name"),
"restriction": element.getchildren()[0].attrib,
"elements": [ e.get("value") for e in element.getchildren()[0].getchildren() ]
}def get_simple_content(element):
return {
"simpleContent": {
"extension": element.getchildren()[0].attrib,
"attributes": [ a.attrib for a in element.getchildren()[0].getchildren() ]
}
}
def get_elements(element):Context
StackExchange Code Review Q#10960, answer score: 3
Revisions (0)
No revisions yet.