HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Showing virus scan results from an API and a CSV file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileshowingscancsvvirusandresultsfromapi

Problem

I am utilizing an API module for interacting with virustotal.com in order to get AntiVirus results based of SHA256 hashes. The code I have is working but I feel like it can be improved greatly. I think that I am brute forcing my way to some of the text fields I am looking for. Any suggestions on optimization would be greatly appreciated.

Python Code:

import csv
import time

def virustotal(hashvalue):
    from virus_total_apis import PublicApi as VirusTotalPublicApi

    API_KEY = 'deadbeef12345facefaceface555555555555555555555555555555555555555'
    vt = VirusTotalPublicApi(API_KEY)
    response = vt.get_file_report(hashvalue)
    return response

with open('C://SHA256.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:
        tempString = str(row)
        tempIndex = tempString.index(',')
        filename = tempString[2:tempIndex-1]
        filehash = tempString[tempIndex+3:-2]
        results = str(virustotal(filehash))
        try:
            tempposit1 = results.index('positives')
            tempposit2 = results.index(',', tempposit1)
            positive = results[tempposit1 + 11:tempposit2]
            temptotal = results.index("'total':")
            temptotal2 = results.index(',', temptotal)
            total = results[temptotal+8:temptotal2]
            finalresults = positive.strip() + "/" + total.strip()
        except ValueError:
            finalresults = "Not_Found"
        print(filename + "," + filehash + "," + finalresults)
        time.sleep(15) #needed delay due to API restrictions


SHA256.csv - example (input data that is being read)

hplgtv_enxml.dll,DDBC281FFE95DDCBE1CC67DF475BC4D553C27306CF978108DE4B6E6ECCF87B69
nsi.dll,732F77DC897A423AFC2A7502D2E103829A3960656A103A2243B52A7F00A40556
hplgtv_timages.dll,DF00AB06ACE714D31C9B592798C3359174198C0D0FB022889F2736F28F7E17C2


Results for hplgtv_enxml.dll from Virustotal.com:

```
{
"response_code": 200,
"results": {
"scan_id": "ddb

Solution

I think that I am brute forcing my way to some of the text fields I am looking for.

Indeed.

csv.reader()


Each row returned by csv.reader(0) is a list of strings. So your line

tempString = str(row)


is undoing the work csv did for you (which is the entire reason for using csv). Instead, you should do this:

readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
filename = row[0]
filehash = row[1]


Use the dict returned by vt.get_file_report()

Edit: my original response directed the OP to use json library to parse JSON-formatted text. That was incorrect, as the returned data was already parsed by json library.

The response returned from your virustotal() function is a dict of the key-value data shown in your "Results for hplgtv_enxml.dll from Virustotal.com" section.

Thus, you just need to access the appropriate key of the dict (specifically, the keys in the results dict of the returned data). To access the "positives" and "total" fields you were looking for, they are under the "results" key:

results = virustotal(filehash)
positive = results['results']['positives']
total = results['results']['total']


Because everything you want is under the "results" subdictionary, I would just reference it (['results']) at the end of the virustotal() call).

Here are the combined suggested modifications (using the list that csv.reader() returns, and using json.loads()):

for row in readCSV:
filename = row[0]
filehash = row[1]
try:
results = virustotal(filehash)['results']
positive = results['positives']
total = results['total']
finalresults = "{}/{}".format(positive, total)
except ValueError:
finalresults = "Not_Found"
print "{},{},{}".format(filename, filehash, finalresults)

Code Snippets

csv.reader()

Context

StackExchange Code Review Q#136548, answer score: 7

Revisions (0)

No revisions yet.