HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Email a notification when detecting changes on a website

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
websiteemailchangeswhendetectingnotification

Problem

The text of a website is checked in a given time period. If there are any changes a mail is sent. There is a option to show/mail the new parts in the website. What could be improved?

```
#!/usr/bin/env python3
import urllib.request, hashlib, time, html2text, smtplib, datetime, argparse

class urlchange:
def __init__(self, url):
self.url = url
self.urlhash = self.createhash()
self.content = self.getcontent()
date = datetime.datetime.now().strftime( "%d.%m.%Y %H:%M:%S" )
print(date+": Start Monitoring... hash: "+self.urlhash)

def getcontent(self):
#Try to get data
try:
urldata = urllib.request.urlopen(self.url).read().decode("utf-8","ignore")
urldata = html2text.html2text(urldata)
except:
print("Can't open url: ", self.url)
return urldata

def createhash(self):
#create hash
urldata = self.getcontent().encode("utf-8")
md5hash = hashlib.md5()
md5hash.update(urldata)
return md5hash.hexdigest()

def comparehash(self):
date = datetime.datetime.now().strftime( "%d.%m.%Y %H:%M:%S" )
if(self.createhash() == self.urlhash):
print(date+": Nothing has changed")
return False
else:
print(date+": Something has changed")
if(not args.nodiff):
print(self.diff())
if(not args.nomail):
try:
sendmail("Url has changed!","The Url "+self.url+" has changed at "+date+" .\n\nNew content:\n"+self.diff())
except:
sendmail("Url has changed!","The Url "+self.url+" has changed at "+date+" .")
elif(not args.nomail):
sendmail("Url has changed!","The Url "+self.url+" has changed at "+date+" .")
return True

def diff(self):
#what has chaged
start, end = 0, 0
newcontent = self.getcontent

Solution

You do not need to find the differences manually, you can use difflib.SequenceMatcher:

I think this is what you need:

>>> a, b = "foobxr", "foobar"
>>> diffs = difflib.SequenceMatcher(None, a, b).get_matching_blocks()
>>> diffs
[Match(a=0, b=0, size=4), Match(a=5, b=5, size=1), Match(a=6, b=6, size=0)]
>>> max((a, b), key=len)[diffs[0].size : diffs[1].a]
'x'


Specific except

Specify what you want to except exactly. Bare except catches even typos!

Use a logger

You print a lot of info, a logger is more flexible and can be very easily redirected to a file.

.format

You can make your messages more readable:

For example:

print(date+": Start Monitoring... hash: "+self.urlhash)


Becomes:

print("{date}: Start Monitoring... Hash: {self.urlhash}".format(**locals()))


Or the more standard:

print("{}: Start Monitoring... Hash: {}".format(date, self.urlhash))


Thanks to @Jatimir for noticing that in python 3.6+ f-strings are a nice way to interpolate variables in strings with a clean sintax: for example:

print(f'{date}: Start Monitoring... Hash: {self.urlhash}')

Code Snippets

>>> a, b = "foobxr", "foobar"
>>> diffs = difflib.SequenceMatcher(None, a, b).get_matching_blocks()
>>> diffs
[Match(a=0, b=0, size=4), Match(a=5, b=5, size=1), Match(a=6, b=6, size=0)]
>>> max((a, b), key=len)[diffs[0].size : diffs[1].a]
'x'
print(date+": Start Monitoring... hash: "+self.urlhash)
print("{date}: Start Monitoring... Hash: {self.urlhash}".format(**locals()))
print("{}: Start Monitoring... Hash: {}".format(date, self.urlhash))
print(f'{date}: Start Monitoring... Hash: {self.urlhash}')

Context

StackExchange Code Review Q#119951, answer score: 6

Revisions (0)

No revisions yet.