patternpythonModerate
#TODO Remove duplication in XML parsing
Viewed 0 times
duplicationxmlremoveparsingtodo
Problem
I have to modify the number of points in this XML in order to test the performance of another program of mine. Here is an example of the XML I have to modify.
performance.xml:
```
Osiris_Control_Center
10
10
32000
calling site
Osiris_Local_Data_0001
State
Telemetered
Osiris_Local_Data_0002
StateQ
Telemetered
Osiris_Local_Data_0003
StateQTimeTag
Telemetered
Osiris_Local_Data_0004
StateExtended
Telemetered
Osiris_Local_Data_0005
StateQTimeTagExtended
Telemetered
Osiris_Local_Data_0006
Real
Telemetered
Osiris_Local_Data_0007
RealQ
Telemetered
Osiris_Local_Data_0008
RealQTimeTag
Telemetered
Osiris_Local_Data_0009
RealExtended
Telemetered
Osiris_Local_Data_0010
RealQTimeTagExtended
Telemetered
Osiris_Local_Data_0011
Discrete
Telemetered
Osiris_Local_Data_0012
DiscreteQ
Telemetered
Osiris_Local_Data_0013
DiscreteQTimeTag
Telemetered
Osiris_Local_Data_0014
DiscreteExtended
Telemetered
Osiris
performance.xml:
```
Osiris_Control_Center
10
10
32000
calling site
Osiris_Local_Data_0001
State
Telemetered
Osiris_Local_Data_0002
StateQ
Telemetered
Osiris_Local_Data_0003
StateQTimeTag
Telemetered
Osiris_Local_Data_0004
StateExtended
Telemetered
Osiris_Local_Data_0005
StateQTimeTagExtended
Telemetered
Osiris_Local_Data_0006
Real
Telemetered
Osiris_Local_Data_0007
RealQ
Telemetered
Osiris_Local_Data_0008
RealQTimeTag
Telemetered
Osiris_Local_Data_0009
RealExtended
Telemetered
Osiris_Local_Data_0010
RealQTimeTagExtended
Telemetered
Osiris_Local_Data_0011
Discrete
Telemetered
Osiris_Local_Data_0012
DiscreteQ
Telemetered
Osiris_Local_Data_0013
DiscreteQTimeTag
Telemetered
Osiris_Local_Data_0014
DiscreteExtended
Telemetered
Osiris
Solution
Low hanging fruit
Most of this will come down to reading PEP8, however I'll state them here anyway.
From the top down:
Rather than one as you have now.
Removing repetitive code
At first I was doing some funky stuff to remove your duplicate code, but you can keep it simple.
Just use generators, and make a few more functions.
I'd make four more functions to reduce the repetition of your code.
You can also make
And so you can use this very small function:
This will allow us to remove
After this is where we'll remove more of the duplicate code.
By first getting a parent, we can then use a generator we can lazily get all the items.
This is used in most of the other functions.
This can be just be finding the node, and then the for loop with a
This will allow us to change all the
Allowing
After this we can change all the
This is as you mostly only change the nodes that you visit, being
And so I'd make a function that takes these nodes names and does the repeated code.
To make the return easier to use I'd use
This can result in:
And allows us to change
After this, I'd change how you call
Instead of manually calling it a lot, you can use a for loop.
And you can do the same for all the
This removes most of the duplicate code.
Finally I'd use
This is as it checks the input for you so you can make it check if
Rather than Python abruptly exiting with a
You can also add help information which will display if you use
and it also automates your message
The only downside to this is you'll need to read more documentation to add more features to the argument input.
All the above changes resulted in me getting:
```
from xml.etree import ElementTree as etree
from collections import namedtuple
import argparse
branch = namedtuple('Branch', ['val', 'item', 'name', 'type', 'source'])
def set_obj_text(node, num, obj):
node.find(obj).text = str(num)
def get_items(node, num, parent, item):
parent_ = node.find(parent)
parent_.set('Count', str(num))
Most of this will come down to reading PEP8, however I'll state them here anyway.
From the top down:
- If statements don't need brackets, so don't put them around them.
- Functions at module level should have two new lines between them and other things.
Rather than one as you have now.
- You should use
snake_casefor functions and variables.
- You can use a dictionary comprehension, rather than a generator comprehension fed to a dict constructor.
- You should put a space both sides of operators,
num + 1, the only exception to this is to show precedence.
- Don't overwrite builtins
type. Instead you can usetype_or a synonym.
- Put all imports at the top of the script.
- Use
withrather than manually opening and closing.
- Use a
mainfunction, this keeps things out of the global scope.
Removing repetitive code
At first I was doing some funky stuff to remove your duplicate code, but you can keep it simple.
Just use generators, and make a few more functions.
I'd make four more functions to reduce the repetition of your code.
You can also make
set_obj_text which changes the attribute text.And so you can use this very small function:
def set_obj_text(node, num, obj):
node.find(obj).text = str(num)This will allow us to remove
editInterval and instead use set_obj_text(tree, num, './/Interval').After this is where we'll remove more of the duplicate code.
By first getting a parent, we can then use a generator we can lazily get all the items.
This is used in most of the other functions.
This can be just be finding the node, and then the for loop with a
yield of etree.SubElement.def get_items(node, num, parent, item):
parent_ = node.find(parent)
parent_.set('Count', str(num))
for i in xrange(1, num + 1):
yield i, etree.SubElement(parent_, item)This will allow us to change all the
add*Branch functions.Allowing
add_data_set_branch to be:def add_data_set_branch(node, num):
for val, item in get_items(node, num, './/CdsVars', 'CdsVar'):
item.set('Scope', 'ICC')
item.set('Type', 'Dv')
item.set('Name', 'Osiris_Test_Data_' + str(val))After this we can change all the
add*Branch except add_data_set_branch to remove more duplicate code.This is as you mostly only change the nodes that you visit, being
parent, item, name, name.text, source and the source.text.And so I'd make a function that takes these nodes names and does the repeated code.
To make the return easier to use I'd use
collections.namedtuple so that we can use ret.name rather than say ret[2].This can result in:
def walk_branch(node, num, custom_values):
v = custom_values.get
for val, item in get_items(node, num, v('parent'), v('item')):
name = etree.SubElement(item, v('name'))
name.text = v('name_text') + str(val)
type_ = etree.SubElement(item, 'DataType')
type_.text = 'RealQTimeTagExtended'
source = etree.SubElement(item, v('source'))
source.text = v('source_text')
yield branch(val, item, name, type_, source)And allows us to change
add_server_data_branch to:def add_server_data_branch(node, num):
set_obj_text(node, num, './/ServerObjects/NumIccDv')
for ret in walk_branch(
node, num, {
'parent': './/ServerDataValues',
'item': 'Sdv',
'name': 'ObjName',
'name_text': 'Osiris_Local_Data_',
'source': 'ReadOnly',
'source_text': 'Y',
}):
ret.name.set('Scope', 'ICC')After this, I'd change how you call
remove_branch.Instead of manually calling it a lot, you can use a for loop.
And you can do the same for all the
data*branchs.for branch in ['Ldv', 'Sdv', 'Cdv', 'CdsVar']:
remove_branch(tree, branch)
for fn in [add_local_data_branch, add_server_data_branch,
add_client_data_branch, add_data_set_branch]:
fn(tree, args.points)This removes most of the duplicate code.
Finally I'd use
argparse rather than sys.argv.This is as it checks the input for you so you can make it check if
points is a number and it exit with a message about it.Rather than Python abruptly exiting with a
ValueError.You can also add help information which will display if you use
file.py -h,and it also automates your message
'Usage: python setup.py [xml] [points] [time]'.The only downside to this is you'll need to read more documentation to add more features to the argument input.
All the above changes resulted in me getting:
```
from xml.etree import ElementTree as etree
from collections import namedtuple
import argparse
branch = namedtuple('Branch', ['val', 'item', 'name', 'type', 'source'])
def set_obj_text(node, num, obj):
node.find(obj).text = str(num)
def get_items(node, num, parent, item):
parent_ = node.find(parent)
parent_.set('Count', str(num))
Code Snippets
def set_obj_text(node, num, obj):
node.find(obj).text = str(num)def get_items(node, num, parent, item):
parent_ = node.find(parent)
parent_.set('Count', str(num))
for i in xrange(1, num + 1):
yield i, etree.SubElement(parent_, item)def add_data_set_branch(node, num):
for val, item in get_items(node, num, './/CdsVars', 'CdsVar'):
item.set('Scope', 'ICC')
item.set('Type', 'Dv')
item.set('Name', 'Osiris_Test_Data_' + str(val))def walk_branch(node, num, custom_values):
v = custom_values.get
for val, item in get_items(node, num, v('parent'), v('item')):
name = etree.SubElement(item, v('name'))
name.text = v('name_text') + str(val)
type_ = etree.SubElement(item, 'DataType')
type_.text = 'RealQTimeTagExtended'
source = etree.SubElement(item, v('source'))
source.text = v('source_text')
yield branch(val, item, name, type_, source)def add_server_data_branch(node, num):
set_obj_text(node, num, './/ServerObjects/NumIccDv')
for ret in walk_branch(
node, num, {
'parent': './/ServerDataValues',
'item': 'Sdv',
'name': 'ObjName',
'name_text': 'Osiris_Local_Data_',
'source': 'ReadOnly',
'source_text': 'Y',
}):
ret.name.set('Scope', 'ICC')Context
StackExchange Code Review Q#134065, answer score: 15
Revisions (0)
No revisions yet.