HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Let's check that domain port

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
letthatportcheckdomain

Problem

Intro

This simple script will allow me to check for a specific opened port on a list of domains that I own. Instead of manually doing this check, I found Python a pretty good idea for such a task.

After profiling my code, I found out that def check_for_open_ports(): is really slow. It takes about 0:01:16.799242 seconds for 4 domains.

I wondered if there's a good / recommended way of improving this (maybe multithreading / multiprocessing). While asking for an answer which implements one of the above two methods is forbidden here, I wouldn't mind seeing one. I know that one shall use multiprocessing when there're I/O bound tasks which makes me believe I might go with a multithreading solution.

The code

```
from socket import gethostbyname, gaierror, error, socket, AF_INET, SOCK_STREAM
from sys import argv, exit
import re

DOMAINS_FILE = argv[1]
PORT = argv[2]
OUTPUT_FILE = argv[3]

def get_domains():
"""
Return a list of domains from domains.txt
"""
domains = []
if len(argv) != 4:
exit("Wrong number of arguments\n")
try:
with open(DOMAINS_FILE) as domains_file:
for line in domains_file:
domains.append(line.rstrip())
except IOError:
exit("First argument should be a file containing domains")
return domains

def check_domain_format(domain):
"""
This function removes the beginning of a domain if it starts with:

www.
http://
http://www.
https://
https://www.
"""
clear_domain = re.match(r"(https?://(?:www\.)?|www\.)(.*)", domain)
if clear_domain:
return clear_domain.group(2)
return domain

def transform_domains_to_ips():
"""
Return a list of ips specific to the domains in domains.txt
"""
domains = get_domains()
domains_ip = []
for each_domain in domains:
each_domain = check_domain_format(each_domain)
try:
domains_ip.append(gethostbyname(each_domain))
except gaierror:

Solution


  • First a slight style note (IMHO, of course). You called your function check_domain_format, but it's actually returning a modified string and you're using the result, not checking it. I'd go for a name like validate_domain_format



About it being slow:

  • Yes, multi-threading would help in checking multiple domains at once, but if that was the only problem you could just make a separate bash script to launch your python script with different parameters.



-
You said that you own the domains, so I'm assuming you have RAW socket capabilities. If that's the case, you can speed up your check by using a SYN check. You can have a look here , even if the question has been down-voted, it should give you the general idea. Here you can find that same check.

-
If you're doing this for educational purposes that's ok, otherwise nmap will most likely do a better job, give you more options and be faster (because SYN check is already implemented and you can also scan for UDP ports, for example).

Context

StackExchange Code Review Q#154196, answer score: 2

Revisions (0)

No revisions yet.