patternpythonMinor
wxPython Item Information Scraper
Viewed 0 times
itemwxpythonscraperinformation
Problem
- What is the best method/practice to keep, let's say, over 3,000 lines of code organized for readability?
- What kind of NO-NOs in regards to coding habits should I get rid of before they become bad habits?
- Inversely what kind of good coding habits should I be picking up?
- Any glaringly obvious mistakes? Any helpful links for further reading would be appreciated.
I am including code that I have been working on for the past couple weeks. Basically it will help to increase the efficiency of my workflow from pulling inventory off trucks, to getting it to where it needs to go.
The following code navigates to a website and extracts information about an Item.
```
# -- coding: utf8 --
import csv, time, os, random, difflib, re, json
from itertools import islice
import httplib
import urllib2
from bs4 import BeautifulSoup as BS
from shutil import copyfile
def get_product_info(item_number):
''' Returns a dictionary of item information '''
# offer_id_orig = '#NK420#########'
offer_id_orig = item_number
item_uts_number = offer_id_orig[1:6]
print offer_id_orig
# DOWNLOAD PRODUCT INFO IF WE DON"T HAVE IT
def download_page_info(item_uts_number):
#import time, os, random
#import httplib
#import urllib2
# ITEM_NUMBER MUST BE 4NK4200000100 FORMAT
# assigned_row_titles 'offer_id_orig' : current_row[2]
print item_uts_number
httplib.HTTPConnection.debuglevel = 1
# take multiple arguments?
fp = os.path.join(folder,item_uts_number)
print "Fetching:"+fp+"\n"
try:
request = urllib2.Request('http://www.fingerhut.com/product/'+item_uts_number.upper()+'.uts')
request.add_header('User-Agent','jmunsch_thnx_v2.0 +http://jamesmunsch.com/')
opener = urllib2.build_opener()
data = opener.open(request).read()
with open(fp,'w+') as f:
f.write(data)
time.sleep(1+random.random()
Solution
As a piece of code for a “beginner” (reading from the tag), this is pretty good. I’ll start by offering some suggestions on your specific questions, and then make some comments on the code itself:
-
What is the best method/practice to keep, let's say, over 3,000 lines of code organized for readability?
-
Scratch readability for now, just keeping 3,000 lines of code organised will get painful quickly. Are you using some sort of version control system (VCS) with the code? If not, I’d recommend looking into one. As you maintain and extend the code, you can keep “snapshots” of working versions or experimental techniques. Using a VCS, and committing regularly, is a really good habit for larger codebases.
The Rypress tutorial is a good introduction to Git, a fairly common VCS.
You can either run a VCS locally, or Bitbucket offers free private repositories. (Bitbucket also have tutorials for Git and Mercurial when you sign up.)
-
Put TODO notes in a separate file. Otherwise it gets hard to find and will get lost in the mix.
-
Write docstrings for all of your non-trivial functions. You remember what
-
Don’t litter the file with multiple newlines between functions or within functions. One is quite enough, and doesn’t break the file up so much.
-
What kind of NO-NOs in regards to coding habits should I get rid of before they become bad habits?
-
Don’t just copy-and-paste special strings into the code; declare them as global variables at the top of the file and use them as appropriate. This means you can only change them once when you need to.
For example, these lines from the two different files give me pause for thought:
Is there a reason for the two user agent strings to have a different format? And why do they have different version numbers? etc.
-
Have you read PEP 8? This is the Python style guide. Overall you’re pretty good, but there are one or two things that could be tweaked:
-
Spacing around commas goes like an English sentence: none before, one after. So
Also, put blocks after a colon like this on a new line.
-
Same goes for whitespace around binary operators. So
Style issues aren’t going to break your code, but it will make it easier for other people to read and debug your code (and conversely, it will make it easier for you to read other people’s Python), because the styles match.
-
Inversely what kind of good coding habits should I be picking up?
Don’t do what I said not to do in 2. ;)
-
Any glaringly obvious mistakes?
Beyond what I’ve already mentioned, nothing stands out as being terrible. (At least, to my eyes.)
If you want more on good programming practice, then I’d suggest reading a copy of The Pragmatic Programmer, which contains some excellent advice on good style (including most of what I wrote above, and a lot more).
-
What is the best method/practice to keep, let's say, over 3,000 lines of code organized for readability?
-
Scratch readability for now, just keeping 3,000 lines of code organised will get painful quickly. Are you using some sort of version control system (VCS) with the code? If not, I’d recommend looking into one. As you maintain and extend the code, you can keep “snapshots” of working versions or experimental techniques. Using a VCS, and committing regularly, is a really good habit for larger codebases.
The Rypress tutorial is a good introduction to Git, a fairly common VCS.
You can either run a VCS locally, or Bitbucket offers free private repositories. (Bitbucket also have tutorials for Git and Mercurial when you sign up.)
-
Put TODO notes in a separate file. Otherwise it gets hard to find and will get lost in the mix.
-
Write docstrings for all of your non-trivial functions. You remember what
onView() is meant to do right now, but you might find it harder to remember in six months time when you’re debugging it. Read PEP 257 for the Python standards for writing docstrings.-
Don’t litter the file with multiple newlines between functions or within functions. One is quite enough, and doesn’t break the file up so much.
-
What kind of NO-NOs in regards to coding habits should I get rid of before they become bad habits?
-
Don’t just copy-and-paste special strings into the code; declare them as global variables at the top of the file and use them as appropriate. This means you can only change them once when you need to.
For example, these lines from the two different files give me pause for thought:
request.add_header('User-Agent','jmunsch_thnx_v2.0 +http://jamesmunsch.com/')
user_agent = 'jmunsch_v3 (+http://jamesmunsch.com/)'Is there a reason for the two user agent strings to have a different format? And why do they have different version numbers? etc.
-
Have you read PEP 8? This is the Python style guide. Overall you’re pretty good, but there are one or two things that could be tweaked:
-
Spacing around commas goes like an English sentence: none before, one after. So
except Exception,e:print e becomesexcept Exception, e:
print eAlso, put blocks after a colon like this on a new line.
-
Same goes for whitespace around binary operators. So
print "Oops: "+str(e) becomes print "Oops: " + str(e).Style issues aren’t going to break your code, but it will make it easier for other people to read and debug your code (and conversely, it will make it easier for you to read other people’s Python), because the styles match.
-
Inversely what kind of good coding habits should I be picking up?
Don’t do what I said not to do in 2. ;)
-
Any glaringly obvious mistakes?
Beyond what I’ve already mentioned, nothing stands out as being terrible. (At least, to my eyes.)
If you want more on good programming practice, then I’d suggest reading a copy of The Pragmatic Programmer, which contains some excellent advice on good style (including most of what I wrote above, and a lot more).
Code Snippets
request.add_header('User-Agent','jmunsch_thnx_v2.0 +http://jamesmunsch.com/')
user_agent = 'jmunsch_v3 (+http://jamesmunsch.com/)'except Exception, e:
print eContext
StackExchange Code Review Q#46223, answer score: 5
Revisions (0)
No revisions yet.