patternpythonMinor
Getting count value for matched document
Viewed 0 times
matchedgettingvaluefordocumentcount
Problem
I have a list of keywords and I want to count the number of records in a db which contain any one of these keywords. From each record, I want to consider only
Is this correctly written?
title and description to be checked.Is this correctly written?
self.keyword_list = ['Buzz', 'Heard on the street', 'familiar with the development', 'familiar with', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'to monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest', 'Buzz', 'Heard on the street', 'familiar with the development', 'familiar with the matter', 'Sources', 'source', 'Anonymous', 'anonymity', 'Rumour', 'Scam', 'Fraud', 'In talks', 'Likely to', 'Cancel', 'May', 'Plans to ', 'Raids', 'raid', 'search', 'Delisting', 'delist', 'Block', 'Exit', 'Cheating', 'Scouts','scouting', 'Default', 'defaulted', 'defaulter', 'Calls off', 'Lease out', 'Pick up', 'delay', 'arrest', 'arrested', 'inks', 'in race', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest']
fetch_record = self.collection1.find()
for record in fetch_record:
count2 += 1
for item in self.keyword_list:
if item.lower() in u'{} {}'.format(record['title'], record['description']).lower():
#print "Matched for item : ", item
count1 += 1
break
print "Matched records are : ", count1
print "Total records are : ", count2Solution
A few things that meet my eye:
Value Casing
You run
Naming
Nitpicks
Value Casing
You run
item.lower() for each item in the keyword list when you iterate over the records. You could ensure the lower-case of all items in your keyword-list before getting to the records, then that step would be away.Naming
count2 should be total, count1 should be matched, collection1 needs a better name. Names are vital in allowing other programmers to understand your code.Nitpicks
- You may get better performance by formatting the string before entering the loop over the keyword-list
- The keyword list contains duplicate items
- Remove Code you do not need instead of commenting it out. Version Control has got your back :)
keyword_listcould bekeywords...
- printing is not really a valid return from a method.
Context
StackExchange Code Review Q#104399, answer score: 4
Revisions (0)
No revisions yet.