patternpythonModerate
Adding a new class to HTML tag and writing it back with Beautiful Soup
Viewed 0 times
newtagwithwritingaddingbacksoupandclasshtml
Problem
I am working on an HTML document to which I need to add certain classes to some elements. In the following code, I am adding class
img-responsive.def add_img_class1(img_tag):
try:
img_tag['class'] = img_tag['class']+' img-responsive'
except KeyError:
img_tag['class'] = 'img-responsive'
return img_tag
def add_img_class2(img_tag):
if img_tag.has_attr('class'):
img_tag['class'] = img_tag['class']+' img-responsive'
else:
img_tag['class'] = 'img-responsive'
return img_tag
soup = BeautifulSoup(myhtml)
for img_tag in soup.find_all('img'):
img_tag = add_img_class1(img_tag) #or img_tag = add_img_class2(img_tag)
html = soup.prettify(soup.original_encoding)
with open("edited.html","wb") as file:
file.write(html)- Both functions do same, however one uses exceptions and another has_attr from BS4. Which is better and why?
- Am I doing the right way of writing back to HTML? Or shall convert entire soup to UTF-8 (by
string.encode('UTF-8')) and write it?
Solution
The second option is better, because the possible error is explicit. However, in lots of case in Python, you should follow EAFP and go for the
get(value, default)
In BeautifulSoup, attributes behave like dictionaries. This means you can write
You don't need to return the new
Multi-valued attributes
Note that the above code doesn't work!
Wich is nicer as you don't have to worry about the extra space between the two values.
Encoding
You don't need to convert to UTF-8 before writing the file back. What's wrong with
try statement. However, we can do better.get(value, default)
In BeautifulSoup, attributes behave like dictionaries. This means you can write
img_tag.get('class', '') to get the class if it exists, or the empty string if it doesn't.def add_img_class(img_tag):
img_tag = img_tag.get('class', '') + ' img-responsive'You don't need to return the new
img_tag as it is passed by reference. Now that your function is a one-liner, you might as well use the one-liner directly.Multi-valued attributes
Note that the above code doesn't work!
class is a multi-valued attribute in HTML4 and HTML5, so at least BeautifulSoup 4 returns a list instead of a string. The correct code becomes:img_tag['class'] = img_tag.get('class', []) + ['img-responsive']Wich is nicer as you don't have to worry about the extra space between the two values.
Encoding
You don't need to convert to UTF-8 before writing the file back. What's wrong with
?Code Snippets
def add_img_class(img_tag):
img_tag = img_tag.get('class', '') + ' img-responsive'img_tag['class'] = img_tag.get('class', []) + ['img-responsive']Context
StackExchange Code Review Q#31523, answer score: 14
Revisions (0)
No revisions yet.