HiveBrain v1.2.0
Get Started
← Back to all entries
debugpythonMinor

Parsing a website

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
websiteparsingstackoverflow

Problem

Following is the code I wrote to download the information of different items in a page.

I have one main website which has links to different items. I parse this main page to get the list. This is handled by the Items class.

I also parse these each of the links in the list using the Item class.

I have implemented a Handler class which is the base class for both of these classes.

```
class Handler:

def __init__(self, url):
self.url = url
self.property = {}
self.homeDir = os.path.dirname(__file__)
self.parser = self.getParser()
self.name = self.getTitle()
self.setupFolder()

def updateName(self, name):
self.name = name

def setupFolder(self):
dataDir = os.path.join(self.homeDir, self.name)
if not os.path.exists(dataDir):
os.makedirs(dataDir)

def getTitle(self):
return "".join(char
for char in self.parser.title.string
if char.isalnum() or char == " ")

def getFilePath(self):
return os.path.join(self.homeDir, self.name)

def getRequest(self, url):
return urllib.urlopen(url).read()

def getParser(self):
parser = BeautifulSoup(self.getRequest(self.url))
return parser

def saveProperty(self, key, value):
self.property[key] = value

def writeProperty(self):
fileName = self.getTitle() + ".property"
with open(fileName, 'w') as f:
f.write("\n".join(
key + ":" + self.property[key]
for key in self.property))

class Items(Handler, object):

def __init__(self, url, category) :

super(Items, self).__init__(url)
self.category = category
self.updateName(category)

def extractContents(self):
self.parser = self.getParser()
contents = self.parser.find("ul",{"class" : "galerie"}).findAll('li')
print len(contents)
return contents

def downloadContents(self):

Solution

The general style for naming in Python is snake_case for functions and variables, and PascalCase for classes. You should also have two blank lines between top-level functions/classes/code blocks, not, an arbitrary amount. You have a few other style violations. To fix these, visit PEP8, Python's official style guide.

Secondly, it looks like you're using Python 2.x. If you're using Python 2.x, you need to have classes explicitly inherit from object. For example: class MyClass(object):, not class MyClass:. If you are using Python 3.x, then you can use the second example.

Finally, you can use string multiplication to print many characters. For example, the line print "===============================" can be shortened to `print "=" * 31.

Context

StackExchange Code Review Q#49754, answer score: 2

Revisions (0)

No revisions yet.