snippetpythonMinor
Script to create a customized, efficient adblocking hosts file
Viewed 0 times
scriptfilehostscreatecustomizedefficientadblocking
Problem
I am writing a Python script that creates a customized, efficient adblocking hosts file. Basically it checks the user's DNS cache and sees if any entries there are listed in the popular hosts files. If so, it adds those entries to the user's hosts file. The purpose is to keep the hosts file small and avoid adverse effects on browsing speed. I was looking for any input on my overall approach, any ways to make it faster, and any stylistic revisions/suggestions.
```
import os
import urllib.request
import subprocess
import re
import datetime
# Create a list of ad domains from hosts files found online
def get_ad_dmns(src_urls):
dmns = set()
for src in src_urls:
entries = [line.decode('utf-8') for line in list(urllib.request.urlopen(src))]
for entry in entries:
# If hosts file entry is a valid block rule, add domain to list
if entry.startswith(('0.0.0.0', '127.0.0.1')):
dmns.add(entry.split()[1])
return dmns
# Create a list of domains found in the user's DNS cache
def get_dns_dmns():
dns_cache = subprocess.check_output('ipconfig /displaydns').decode('utf-8')
# Regex pattern to match domains in the DNS cache
pattern = '\n\s+(\S+)\r\n\s+-'
dmns = re.findall(pattern, dns_cache)
return dmns
# Create a list of domains currently in the user's hosts file
def get_cur_dmns(hosts_dir):
os.chdir(hosts_dir)
dmns = set()
hosts_file = open('hosts', 'r')
for entry in hosts_file:
if entry.startswith(('0.0.0.0', '127.0.0.1')):
dmns.add(entry.split()[1])
hosts_file.close()
return dmns
# Write new domains to the hosts file
def write_hosts_file(dmns, hosts_dir):
os.chdir(hosts_dir)
hosts_file = open('hosts', 'a')
hosts_file.write('\n# Updated: {}\n'.format(datetime.datetime.now()))
for dmn in dmns:
hosts_file.write('0.0.0.0 {}\n'.format(dmn))
hosts_file.close()
def main():
hosts_dir = 'C:/Windows/System32/drivers/etc'
```
import os
import urllib.request
import subprocess
import re
import datetime
# Create a list of ad domains from hosts files found online
def get_ad_dmns(src_urls):
dmns = set()
for src in src_urls:
entries = [line.decode('utf-8') for line in list(urllib.request.urlopen(src))]
for entry in entries:
# If hosts file entry is a valid block rule, add domain to list
if entry.startswith(('0.0.0.0', '127.0.0.1')):
dmns.add(entry.split()[1])
return dmns
# Create a list of domains found in the user's DNS cache
def get_dns_dmns():
dns_cache = subprocess.check_output('ipconfig /displaydns').decode('utf-8')
# Regex pattern to match domains in the DNS cache
pattern = '\n\s+(\S+)\r\n\s+-'
dmns = re.findall(pattern, dns_cache)
return dmns
# Create a list of domains currently in the user's hosts file
def get_cur_dmns(hosts_dir):
os.chdir(hosts_dir)
dmns = set()
hosts_file = open('hosts', 'r')
for entry in hosts_file:
if entry.startswith(('0.0.0.0', '127.0.0.1')):
dmns.add(entry.split()[1])
hosts_file.close()
return dmns
# Write new domains to the hosts file
def write_hosts_file(dmns, hosts_dir):
os.chdir(hosts_dir)
hosts_file = open('hosts', 'a')
hosts_file.write('\n# Updated: {}\n'.format(datetime.datetime.now()))
for dmn in dmns:
hosts_file.write('0.0.0.0 {}\n'.format(dmn))
hosts_file.close()
def main():
hosts_dir = 'C:/Windows/System32/drivers/etc'
Solution
Overall, this is pretty good, but all code can be better! A few suggestions:
-
There aren’t any docstrings telling me what a function does, or what it returns. With a small codebase like this, it’s fairly easy for me to just read the code, but it’s a good habit to get into.
-
I’m nitpicking, but module imports should really be alphabetically ordered. It’s not so much of a problem for short scripts like this, but it’s really useful in large codebases.
-
In the get_ad_dmns() function, you create the
I think that might be a little simpler and more memory efficient.
-
Rather than using
-
In get_cur_dmns(), rather than using
Ditto for write_hosts_file().
-
In your main() function, the
-
There aren’t any docstrings telling me what a function does, or what it returns. With a small codebase like this, it’s fairly easy for me to just read the code, but it’s a good habit to get into.
-
I’m nitpicking, but module imports should really be alphabetically ordered. It’s not so much of a problem for short scripts like this, but it’s really useful in large codebases.
-
In the get_ad_dmns() function, you create the
entries list by iterating over urllib.request.urlopen(src), then immediately iterate over the list. Could you skip creating the list? i.e.for line in list(urllib.request.urlopen(src)):
entry = line.decode('utf-8')
# rest of the code hereI think that might be a little simpler and more memory efficient.
-
Rather than using
dmns as an abbreviation for domains, just use the full name. Characters are cheap, and it makes your code easier to read.-
In get_cur_dmns(), rather than using
open(myfile) ... close(myfile), the more idiomatic construction is:with open('hosts'() as hosts_file:
for entry in hosts_file:
# do stuff with the entryDitto for write_hosts_file().
-
In your main() function, the
dmns_to_add list comprehension is just a little hard to read. I’d suggest adding an extra line break for the not in cur_dmns line to make it easier to read.Code Snippets
for line in list(urllib.request.urlopen(src)):
entry = line.decode('utf-8')
# rest of the code herewith open('hosts'() as hosts_file:
for entry in hosts_file:
# do stuff with the entryContext
StackExchange Code Review Q#96679, answer score: 5
Revisions (0)
No revisions yet.