HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Parsing XML in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parsingpythonxml

Problem

The aim of this code is to return an object which contains all of the movies. Where attributes are not found, they need to return None rather than be undefined and raise an exception.

attrs is a list of all possible properties for that class. Each class loops through and tries to get the attribute. If the attribute is not there, setattr(self,attr,video.attrib.get(attr)) will return None.

The end result is an object which can be referenced like so:

Movielist.movies[0].Video.title would return "30 Minutes Or Less".

The XML structure is as follows:

  • Each movie is returned as an XML `



  • Each movie has a subsection which may contain multiple elements



  • Within each subsection are subsections. There may also be multiple entries here.



I'm sure there must be a better / simpler way of doing this.

A sample of the Python source code follows, along with a sample of XML code returned / scanned by the
Movielist class after that:

import httplib
import xml.etree.ElementTree as ElementTree

def getUrl(server,url):
c=httplib.HTTPConnection(server.ip,server.port)
c.request("GET",url)
r=c.getresponse()
return r.read()

def getxml(server,serverPath,tag):
responseData=getUrl(server,serverPath)
e=ElementTree.XML(responseData)
if not tag:
return e
else:
return e.findall(tag)

class Video():
class Media():
class Part():
def __init__(self,part):
attrs=['key',
'duration',
'file',
'size',
]
for attr in attrs:
setattr(self,attr,part.attrib.get(attr))

def __init__(self,media):
attrs=['id',
'duration',
'bitrate',
'width',
'height',
'aspectRatio',
'audioChannels',
'audioCodec',
'vide

Solution

import httplib
import xml.etree.ElementTree as ElementTree

def getUrl(server,url):
    c=httplib.HTTPConnection(server.ip,server.port)
    c.request("GET",url)
    r=c.getresponse()
    return r.read()


Firstly, the function doesn't really match its name. getUrl makes me think it should return a URL, not take my url and download it. Secondly, there is function urllib.urlopen which does everything this function does minus the last line. And it takes a url as an argument, so you can combine the two parameters here.

def getxml(server,serverPath,tag):
    responseData=getUrl(server,serverPath)
    e=ElementTree.XML(responseData)


But some spaces around those =. I also think you shouldn't use single letter variables name (usually).

if not tag:


When checking for None I recommend tag is not None just to be explicit.

return e
    else:
        return e.findall(tag)

class Video():


Either drop the parens here, or put object in them.

class Media():
        class Part():


I'm not a fan of nesting classes (usually). I'd separate them out.

def __init__(self,part):
                attrs=['key',
                       'duration',
                       'file',
                       'size',
                       ]
                for attr in attrs:
                    setattr(self,attr,part.attrib.get(attr))


I would recommend against loading from XML in your classes' constructor. It ties your internal objects too much into the structure of the xml file. I'd suggest having external function which extract the data from the XML and then pass it into the constructor.

def __init__(self,media):


Here's the reason why nesting classes is problematic. Its hard to tell what class I'm in now.

attrs=['id',
                   'duration',
                   'bitrate',
                   'width',
                   'height',
                   'aspectRatio',
                   'audioChannels',
                   'audioCodec',
                   'videoCodec',
                   'videoResolution',
                   'container',
                   'videoFrameRate',
                   'optimizedForStreaming',
                   ]
            for attr in attrs:
                setattr(self,attr,media.attrib.get(attr))


We see this piece of code exactly in the previous constructor. You should move it into a function.

xmlParts=media.findall("Part")
            self.parts=[]
            for part in xmlParts:
                p=self.Part(part)
                self.parts.append(p)


Instead you can do self.parts = map(self.Part, media.findall("Part")). That will make this less complicated here.

def __init__(self,server,video):
        # Get the detail
        self.key=video.attrib.get('key')


Do you really want to get None here if the key is missing?

x=getxml(server,self.key,"Video")
        video=x[0]


Any reason not to combine those two lines? Your constructor fetches the data from the server. As with the xml parsing I think this is best done outside of the constructor.

self.video=video


Why do you want to keep a reference the xml node? Shouldn't you just toss it once you've finished reading from it?

attrs=['ratingKey',
               'key',
               'studio',
               'type',
               'title',
               'contentRating',
               'summary',
               'rating',
               'year',
               'tagline',
               'thumb',
               'art',
               'duration',
               'originallyAvailableAt',
               'addedAt',
               'updatedAt',
               'titleSort',
               'viewCount',
               'viewOffset',
               'guid',
               'lastViewedAt',
               'index',
               ]
        for attr in attrs:
            setattr(self,attr,video.attrib.get(attr))

        self.media=None
        for subitem in self.video:
            if subitem.tag=="Media":
                self.media=self.Media(subitem)


Is there a find function or something to make this simpler?

class Server:


The class name should probably be more specific then just Server

def __init__(self,ip,port):
        self.ip=ip
        self.port=port

    def movielisturl(self,key):
        return "/library/sections/%s/all" % key


I'd probably have the server class define a method that returns the XML rather then just providing the URL. That the way the server can actually fetch the XML anyway it likes not just a http connection.

class Movielist:

    def __init__(self,server,key):
        x=getxml(server,server.movielisturl(key),"Video")
        print x
        self.movies=[]
        for item in x:
            m=Video(server,item)
            self.movies.append(m)


I wouldn't have a MovieList class. I'd just have a list of movies. I'd have a function that returns a list of movies from a particular server.

```
def main():
s=

Code Snippets

import httplib
import xml.etree.ElementTree as ElementTree

def getUrl(server,url):
    c=httplib.HTTPConnection(server.ip,server.port)
    c.request("GET",url)
    r=c.getresponse()
    return r.read()
def getxml(server,serverPath,tag):
    responseData=getUrl(server,serverPath)
    e=ElementTree.XML(responseData)
if not tag:
return e
    else:
        return e.findall(tag)

class Video():
class Media():
        class Part():

Context

StackExchange Code Review Q#7044, answer score: 4

Revisions (0)

No revisions yet.