patternpythonMinor
Extracting a div from parsed HTML
Viewed 0 times
divextractingfromparsedhtml
Problem
It seems that lxml/etree are generally imported as
I know for a script of this size it doesn't matter much, but I'm going to be using these modules for a lot more. I'm also curious about what others have to say.
from lxml import etree -- why is that? It keeps the code tidier, and while the potential namespace ambiguity might not be a concern, I don't have any incentive of doing this as it's generally frowned upon.I know for a script of this size it doesn't matter much, but I'm going to be using these modules for a lot more. I'm also curious about what others have to say.
#!/usr/bin/python
# Stuart Powers http://sente.cc/
import sys
import urllib
import lxml.html
from cStringIO import StringIO
""" This script parses HTML and extracts the div with an id of 'search-results':
ex: ...
$ python script.py "http://www.youtube.com/result?search_query=python+stackoverflow&page=1"
The output, if piped to a file would look like: http://c.sente.cc/E4xR/lxml_results.html
"""
parser = lxml.html.HTMLParser()
filecontents = urllib.urlopen(sys.argv[1]).read()
tree = lxml.etree.parse(StringIO(filecontents), parser)
node = tree.xpath("//div[@id='search-results']")[0]
print lxml.etree.tostring(tree, pretty_print=True)Solution
You might be confusing
from lxml import etree that is a legitimate (even preferred) form of an absolute import with relative imports for intra-package imports that are discouraged: http://www.python.org/dev/peps/pep-0008/ (see "Imports" section)Context
StackExchange Code Review Q#7430, answer score: 3
Revisions (0)
No revisions yet.