HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python XML parsing, extraction and renaming files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
extractionxmlfilesparsingpythonrenamingand

Problem

I've written some code to:

  • Parse an XML file.



  • Extract some strings.



  • Make copies of the original XML whose names are suffixed with the extracted strings.



Please review this.

from lxml import etree as ET
import re

def create_paramsuffix_copies(xmlfile):

    NSMAP = {"c": "http://www.copasi.org/static/schema"}

    parsed = ET.parse(xmlfile)

    list_of_params = []

    for a in parsed.xpath("//c:Reaction", namespaces=NSMAP):
        for b in a.xpath(".//c:Constant", namespaces=NSMAP):
            list_of_params.append((a.attrib['name'])+'_'+(b.attrib['name']))

    return list_of_params

def add_suffix(xmlfile, name_suffix):

    parsed = ET.parse(xmlfile) #parse xml

    xmlfile = re.sub('.cps', '', xmlfile) #remove suffix

    parsed.write(xmlfile+'_'+name_suffix+'.cps', 
                 xml_declaration = True,
                 encoding = 'utf-8',
                 method = 'xml')

modelfile = 'ct.cps'

for a in create_paramsuffix_copies(modelfile):
    add_suffix(modelfile,a)

Solution

Some style changes can happen, most are described here.

  • Using single letter variable names can be hard to understand, and don't explain what the data is well. It is good not to use them.



  • str.format is the best way to put variables into strings. E.G. '{}_{}.cps'.format(xmlfile, name_suffix)



  • CONSTANTS should be all caps. MODEL_FILE = 'ct.cps'



  • When passing 'kwargs' don't put spaces around the = operator. E.G. encoding='utf-8'



  • Use new-lines sparingly.



  • In-line comments are harder to read then comments before the code.



  • Comments should have a space between the '#' and the comment.



It may be good to change create_paramsuffix_copies to a generator, in this case it should be faster than making a list.

def create_paramsuffix_copies(xmlfile):
    NSMAP = {"c": "http://www.copasi.org/static/schema"}
    parsed = ET.parse(xmlfile)
    for a in parsed.xpath("//c:Reaction", namespaces=NSMAP):
        for b in a.xpath(".//c:Constant", namespaces=NSMAP):
            yield '{}_{}'.format(a.attrib['name'], b.attrib['name'])


You could add docstrings to your program so people know what the functions do at a later date.

But it looks really good!

Code Snippets

def create_paramsuffix_copies(xmlfile):
    NSMAP = {"c": "http://www.copasi.org/static/schema"}
    parsed = ET.parse(xmlfile)
    for a in parsed.xpath("//c:Reaction", namespaces=NSMAP):
        for b in a.xpath(".//c:Constant", namespaces=NSMAP):
            yield '{}_{}'.format(a.attrib['name'], b.attrib['name'])

Context

StackExchange Code Review Q#96025, answer score: 7

Revisions (0)

No revisions yet.