patternpythonMinor
Will multi-threading or other method make my program run faster?
Viewed 0 times
multimethodmakeprogramfasterthreadingwillotherrun
Problem
I didn't use multi-threading so far as I didn't need to. But as far as I've read, implementing them will make my program slightly faster than it actually is.
My code is perfectly functional, but when it handles big files ( > 2k rows ) it runs really slow. I'd like to take the best out of it and make it as faster as possible using multi-threading or any other methods.
I'd like when answering, if possible, somebody to explain me whether using multi-threading will help me optimize the program or not. More, I'd like somebody to also explain from his / her experience how can I optimize my code
from validate_email import validate_email
import os
# the program is reading each line from "emails.txt" and after it checks each email it will remove duplicates and sort the godd / bad emails
def verify_emails(all_emails_file, all_good_emails_file, all_bad_emails_file):
with open(all_emails_file) as f: all_emails = f.readlines()
rs_emails = [elem.strip('\n') for elem in all_emails]
rs_emails_set = set(rs_emails) # remove duplicates
good_emails_file, bad_emails_file = open(all_good_emails_file, 'w+'), open(all_bad_emails_file, 'w+')
for email in rs_emails_set:
if validate_email(email, verify=True):
print >> good_emails_file, email
else:
print >> bad_emails_file, email
if __name__ == "__main__":
clear = lambda: os.system('cls')
clear()
try:
verify_emails("emails.txt", "good_emails.txt", "bad_emails.txt")
except:
print "\n\nFile with emails could not be found. Please create emails.txt and run the program again\n\n"My code is perfectly functional, but when it handles big files ( > 2k rows ) it runs really slow. I'd like to take the best out of it and make it as faster as possible using multi-threading or any other methods.
I'd like when answering, if possible, somebody to explain me whether using multi-threading will help me optimize the program or not. More, I'd like somebody to also explain from his / her experience how can I optimize my code
Solution
As stated in the comments, multi-threading might not be what you are looking for.
I think there is room for improvement on the way you read your file. Currently:
Reading this comment, I strongly recommend you to test the following:
Instead of immediately writing the good and bad emails files, you could store those into 2 lists and write them at once at the very end (removing some I/O with the filesystem as well)
I think there is room for improvement on the way you read your file. Currently:
- you read the whole file into a string
all_emails = f.readlines()
- you remove duplicates
rs_emails_set = set(rs_emails) # remove duplicates
- and you read every element of this array
for email in rs_emails_set:
Reading this comment, I strongly recommend you to test the following:
processed_emails = set()
for email in f:
if email not in processed_emails:
# validate email or not
processed_emails.add(email)Instead of immediately writing the good and bad emails files, you could store those into 2 lists and write them at once at the very end (removing some I/O with the filesystem as well)
Code Snippets
processed_emails = set()
for email in f:
if email not in processed_emails:
# validate email or not
processed_emails.add(email)Context
StackExchange Code Review Q#106914, answer score: 5
Revisions (0)
No revisions yet.