patternpythonMinor
Parsing XML in Python
Viewed 0 times
parsingpythonxml
Problem
The aim of this code is to return an object which contains all of the movies. Where attributes are not found, they need to return
The end result is an object which can be referenced like so:
The XML structure is as follows:
I'm sure there must be a better / simpler way of doing this.
A sample of the Python source code follows, along with a sample of XML code returned / scanned by the Movielist
import xml.etree.ElementTree as ElementTree
def getUrl(server,url):
c=httplib.HTTPConnection(server.ip,server.port)
c.request("GET",url)
r=c.getresponse()
return r.read()
def getxml(server,serverPath,tag):
responseData=getUrl(server,serverPath)
e=ElementTree.XML(responseData)
if not tag:
return e
else:
return e.findall(tag)
class Video():
class Media():
class Part():
def __init__(self,part):
attrs=['key',
'duration',
'file',
'size',
]
for attr in attrs:
setattr(self,attr,part.attrib.get(attr))
def __init__(self,media):
attrs=['id',
'duration',
'bitrate',
'width',
'height',
'aspectRatio',
'audioChannels',
'audioCodec',
'vide
None rather than be undefined and raise an exception.attrs is a list of all possible properties for that class. Each class loops through and tries to get the attribute. If the attribute is not there, setattr(self,attr,video.attrib.get(attr)) will return None.The end result is an object which can be referenced like so:
Movielist.movies[0].Video.title would return "30 Minutes Or Less".The XML structure is as follows:
- Each movie is returned as an XML `
- Each movie has a
subsection which may contain multiple elements
- Within each
subsection aresubsections. There may also be multiple entries here.
I'm sure there must be a better / simpler way of doing this.
A sample of the Python source code follows, along with a sample of XML code returned / scanned by the Movielist
class after that:
import httplibimport xml.etree.ElementTree as ElementTree
def getUrl(server,url):
c=httplib.HTTPConnection(server.ip,server.port)
c.request("GET",url)
r=c.getresponse()
return r.read()
def getxml(server,serverPath,tag):
responseData=getUrl(server,serverPath)
e=ElementTree.XML(responseData)
if not tag:
return e
else:
return e.findall(tag)
class Video():
class Media():
class Part():
def __init__(self,part):
attrs=['key',
'duration',
'file',
'size',
]
for attr in attrs:
setattr(self,attr,part.attrib.get(attr))
def __init__(self,media):
attrs=['id',
'duration',
'bitrate',
'width',
'height',
'aspectRatio',
'audioChannels',
'audioCodec',
'vide
Solution
import httplib
import xml.etree.ElementTree as ElementTree
def getUrl(server,url):
c=httplib.HTTPConnection(server.ip,server.port)
c.request("GET",url)
r=c.getresponse()
return r.read()Firstly, the function doesn't really match its name. getUrl makes me think it should return a URL, not take my url and download it. Secondly, there is function urllib.urlopen which does everything this function does minus the last line. And it takes a url as an argument, so you can combine the two parameters here.
def getxml(server,serverPath,tag):
responseData=getUrl(server,serverPath)
e=ElementTree.XML(responseData)But some spaces around those =. I also think you shouldn't use single letter variables name (usually).
if not tag:When checking for None I recommend
tag is not None just to be explicit. return e
else:
return e.findall(tag)
class Video():Either drop the parens here, or put
object in them. class Media():
class Part():I'm not a fan of nesting classes (usually). I'd separate them out.
def __init__(self,part):
attrs=['key',
'duration',
'file',
'size',
]
for attr in attrs:
setattr(self,attr,part.attrib.get(attr))I would recommend against loading from XML in your classes' constructor. It ties your internal objects too much into the structure of the xml file. I'd suggest having external function which extract the data from the XML and then pass it into the constructor.
def __init__(self,media):Here's the reason why nesting classes is problematic. Its hard to tell what class I'm in now.
attrs=['id',
'duration',
'bitrate',
'width',
'height',
'aspectRatio',
'audioChannels',
'audioCodec',
'videoCodec',
'videoResolution',
'container',
'videoFrameRate',
'optimizedForStreaming',
]
for attr in attrs:
setattr(self,attr,media.attrib.get(attr))We see this piece of code exactly in the previous constructor. You should move it into a function.
xmlParts=media.findall("Part")
self.parts=[]
for part in xmlParts:
p=self.Part(part)
self.parts.append(p)Instead you can do
self.parts = map(self.Part, media.findall("Part")). That will make this less complicated here.def __init__(self,server,video):
# Get the detail
self.key=video.attrib.get('key')Do you really want to get None here if the key is missing?
x=getxml(server,self.key,"Video")
video=x[0]Any reason not to combine those two lines? Your constructor fetches the data from the server. As with the xml parsing I think this is best done outside of the constructor.
self.video=videoWhy do you want to keep a reference the xml node? Shouldn't you just toss it once you've finished reading from it?
attrs=['ratingKey',
'key',
'studio',
'type',
'title',
'contentRating',
'summary',
'rating',
'year',
'tagline',
'thumb',
'art',
'duration',
'originallyAvailableAt',
'addedAt',
'updatedAt',
'titleSort',
'viewCount',
'viewOffset',
'guid',
'lastViewedAt',
'index',
]
for attr in attrs:
setattr(self,attr,video.attrib.get(attr))
self.media=None
for subitem in self.video:
if subitem.tag=="Media":
self.media=self.Media(subitem)Is there a find function or something to make this simpler?
class Server:The class name should probably be more specific then just
Serverdef __init__(self,ip,port):
self.ip=ip
self.port=port
def movielisturl(self,key):
return "/library/sections/%s/all" % keyI'd probably have the server class define a method that returns the XML rather then just providing the URL. That the way the server can actually fetch the XML anyway it likes not just a http connection.
class Movielist:
def __init__(self,server,key):
x=getxml(server,server.movielisturl(key),"Video")
print x
self.movies=[]
for item in x:
m=Video(server,item)
self.movies.append(m)I wouldn't have a MovieList class. I'd just have a list of movies. I'd have a function that returns a list of movies from a particular server.
```
def main():
s=
Code Snippets
import httplib
import xml.etree.ElementTree as ElementTree
def getUrl(server,url):
c=httplib.HTTPConnection(server.ip,server.port)
c.request("GET",url)
r=c.getresponse()
return r.read()def getxml(server,serverPath,tag):
responseData=getUrl(server,serverPath)
e=ElementTree.XML(responseData)if not tag:return e
else:
return e.findall(tag)
class Video():class Media():
class Part():Context
StackExchange Code Review Q#7044, answer score: 4
Revisions (0)
No revisions yet.