HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Getting count value for matched document

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
matchedgettingvaluefordocumentcount

Problem

I have a list of keywords and I want to count the number of records in a db which contain any one of these keywords. From each record, I want to consider only title and description to be checked.

Is this correctly written?

self.keyword_list = ['Buzz', 'Heard on the street', 'familiar with the development', 'familiar with', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'to monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest', 'Buzz', 'Heard on the street', 'familiar with the development', 'familiar with the matter', 'Sources', 'source', 'Anonymous', 'anonymity', 'Rumour', 'Scam', 'Fraud', 'In talks', 'Likely to', 'Cancel', 'May', 'Plans to ', 'Raids', 'raid', 'search', 'Delisting', 'delist', 'Block', 'Exit', 'Cheating', 'Scouts','scouting', 'Default', 'defaulted', 'defaulter', 'Calls off', 'Lease out', 'Pick up', 'delay', 'arrest', 'arrested', 'inks', 'in race', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest']
fetch_record = self.collection1.find()
for record in fetch_record:
    count2 += 1
    for item in self.keyword_list:
        if item.lower() in u'{} {}'.format(record['title'], record['description']).lower():
            #print "Matched for item : ", item
            count1 += 1
            break 
print "Matched records are : ", count1
print "Total records are : ", count2

Solution

A few things that meet my eye:
Value Casing

You run item.lower() for each item in the keyword list when you iterate over the records. You could ensure the lower-case of all items in your keyword-list before getting to the records, then that step would be away.
Naming

count2 should be total, count1 should be matched, collection1 needs a better name. Names are vital in allowing other programmers to understand your code.
Nitpicks

  • You may get better performance by formatting the string before entering the loop over the keyword-list



  • The keyword list contains duplicate items



  • Remove Code you do not need instead of commenting it out. Version Control has got your back :)



  • keyword_list could be keywords...



  • printing is not really a valid return from a method.

Context

StackExchange Code Review Q#104399, answer score: 4

Revisions (0)

No revisions yet.