HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

#TODO Remove duplication in XML parsing

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
duplicationxmlremoveparsingtodo

Problem

I have to modify the number of points in this XML in order to test the performance of another program of mine. Here is an example of the XML I have to modify.

performance.xml:

```

Osiris_Control_Center
10
10
32000
calling site



Osiris_Local_Data_0001
State
Telemetered


Osiris_Local_Data_0002
StateQ
Telemetered


Osiris_Local_Data_0003
StateQTimeTag
Telemetered


Osiris_Local_Data_0004
StateExtended
Telemetered


Osiris_Local_Data_0005
StateQTimeTagExtended
Telemetered


Osiris_Local_Data_0006
Real
Telemetered


Osiris_Local_Data_0007
RealQ
Telemetered


Osiris_Local_Data_0008
RealQTimeTag
Telemetered


Osiris_Local_Data_0009
RealExtended
Telemetered


Osiris_Local_Data_0010
RealQTimeTagExtended
Telemetered


Osiris_Local_Data_0011
Discrete
Telemetered


Osiris_Local_Data_0012
DiscreteQ
Telemetered


Osiris_Local_Data_0013
DiscreteQTimeTag
Telemetered


Osiris_Local_Data_0014
DiscreteExtended
Telemetered


Osiris

Solution

Low hanging fruit

Most of this will come down to reading PEP8, however I'll state them here anyway.
From the top down:

  • If statements don't need brackets, so don't put them around them.



  • Functions at module level should have two new lines between them and other things.


Rather than one as you have now.

  • You should use snake_case for functions and variables.



  • You can use a dictionary comprehension, rather than a generator comprehension fed to a dict constructor.



  • You should put a space both sides of operators, num + 1, the only exception to this is to show precedence.



  • Don't overwrite builtins type. Instead you can use type_ or a synonym.



  • Put all imports at the top of the script.



  • Use with rather than manually opening and closing.



  • Use a main function, this keeps things out of the global scope.



Removing repetitive code

At first I was doing some funky stuff to remove your duplicate code, but you can keep it simple.
Just use generators, and make a few more functions.
I'd make four more functions to reduce the repetition of your code.

You can also make set_obj_text which changes the attribute text.
And so you can use this very small function:

def set_obj_text(node, num, obj):
    node.find(obj).text = str(num)


This will allow us to remove editInterval and instead use set_obj_text(tree, num, './/Interval').

After this is where we'll remove more of the duplicate code.
By first getting a parent, we can then use a generator we can lazily get all the items.
This is used in most of the other functions.
This can be just be finding the node, and then the for loop with a yield of etree.SubElement.

def get_items(node, num, parent, item):
    parent_ = node.find(parent)
    parent_.set('Count', str(num))
    for i in xrange(1, num + 1):
        yield i, etree.SubElement(parent_, item)


This will allow us to change all the add*Branch functions.
Allowing add_data_set_branch to be:

def add_data_set_branch(node, num):
    for val, item in get_items(node, num, './/CdsVars', 'CdsVar'):
        item.set('Scope', 'ICC')
        item.set('Type', 'Dv')
        item.set('Name', 'Osiris_Test_Data_' + str(val))


After this we can change all the add*Branch except add_data_set_branch to remove more duplicate code.
This is as you mostly only change the nodes that you visit, being parent, item, name, name.text, source and the source.text.
And so I'd make a function that takes these nodes names and does the repeated code.
To make the return easier to use I'd use collections.namedtuple so that we can use ret.name rather than say ret[2].
This can result in:

def walk_branch(node, num, custom_values):
    v = custom_values.get
    for val, item in get_items(node, num, v('parent'), v('item')):
        name = etree.SubElement(item, v('name'))
        name.text = v('name_text') + str(val)
        type_ = etree.SubElement(item, 'DataType')
        type_.text = 'RealQTimeTagExtended'
        source = etree.SubElement(item, v('source'))
        source.text = v('source_text')
        yield branch(val, item, name, type_, source)


And allows us to change add_server_data_branch to:

def add_server_data_branch(node, num):
    set_obj_text(node, num, './/ServerObjects/NumIccDv')
    for ret in walk_branch(
            node, num, {
                'parent': './/ServerDataValues',
                'item': 'Sdv',
                'name': 'ObjName',
                'name_text': 'Osiris_Local_Data_',
                'source': 'ReadOnly',
                'source_text': 'Y',
            }):
        ret.name.set('Scope', 'ICC')


After this, I'd change how you call remove_branch.
Instead of manually calling it a lot, you can use a for loop.
And you can do the same for all the data*branchs.

for branch in ['Ldv', 'Sdv', 'Cdv', 'CdsVar']:
    remove_branch(tree, branch)
for fn in [add_local_data_branch, add_server_data_branch,
           add_client_data_branch, add_data_set_branch]:
    fn(tree, args.points)


This removes most of the duplicate code.

Finally I'd use argparse rather than sys.argv.
This is as it checks the input for you so you can make it check if points is a number and it exit with a message about it.
Rather than Python abruptly exiting with a ValueError.
You can also add help information which will display if you use file.py -h,
and it also automates your message 'Usage: python setup.py [xml] [points] [time]'.
The only downside to this is you'll need to read more documentation to add more features to the argument input.

All the above changes resulted in me getting:

```
from xml.etree import ElementTree as etree
from collections import namedtuple
import argparse

branch = namedtuple('Branch', ['val', 'item', 'name', 'type', 'source'])

def set_obj_text(node, num, obj):
node.find(obj).text = str(num)

def get_items(node, num, parent, item):
parent_ = node.find(parent)
parent_.set('Count', str(num))

Code Snippets

def set_obj_text(node, num, obj):
    node.find(obj).text = str(num)
def get_items(node, num, parent, item):
    parent_ = node.find(parent)
    parent_.set('Count', str(num))
    for i in xrange(1, num + 1):
        yield i, etree.SubElement(parent_, item)
def add_data_set_branch(node, num):
    for val, item in get_items(node, num, './/CdsVars', 'CdsVar'):
        item.set('Scope', 'ICC')
        item.set('Type', 'Dv')
        item.set('Name', 'Osiris_Test_Data_' + str(val))
def walk_branch(node, num, custom_values):
    v = custom_values.get
    for val, item in get_items(node, num, v('parent'), v('item')):
        name = etree.SubElement(item, v('name'))
        name.text = v('name_text') + str(val)
        type_ = etree.SubElement(item, 'DataType')
        type_.text = 'RealQTimeTagExtended'
        source = etree.SubElement(item, v('source'))
        source.text = v('source_text')
        yield branch(val, item, name, type_, source)
def add_server_data_branch(node, num):
    set_obj_text(node, num, './/ServerObjects/NumIccDv')
    for ret in walk_branch(
            node, num, {
                'parent': './/ServerDataValues',
                'item': 'Sdv',
                'name': 'ObjName',
                'name_text': 'Osiris_Local_Data_',
                'source': 'ReadOnly',
                'source_text': 'Y',
            }):
        ret.name.set('Scope', 'ICC')

Context

StackExchange Code Review Q#134065, answer score: 15

Revisions (0)

No revisions yet.