HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Searching for a value from one CSV file in another CSV file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filesearchingcsvvalueoneforanotherfrom

Problem

I am writing a script that takes one CSV file searches for a value in another CSV file then writes an output depending on the result it finds.

I have been using Python's CSV Distreader and writer. I have it working, but it is very inefficient because it is looping through the 2 sets of data until it finds a result.

There are a few bits in the code which are specific to my setup (file locations etc), but I'm sure people can see around this.

# Set all csv attributes

cache = {}
in_file = open(sript_path + '/cp_updates/' + update_file, 'r')
reader = csv.DictReader(in_file, delimiter= ',')
out_file = open(sript_path + '/cp_updates/' + update_file + '.new', 'w')
out_file.write("StockNumber,SKU,ChannelProfileID\n")
writer = csv.DictWriter(out_file, fieldnames=('StockNumber', 'SKU', 'ChannelProfileID'), delimiter=',')
check_file = open(sript_path + '/feeds/' + feed_file, 'r')
ch_file_reader = csv.DictReader(check_file, delimiter=',')

#loop through the csv's, find stock levels and update file

for row in reader:
    #print row
    check_file.seek(0)
    found = False
    for ch_row in ch_file_reader:
        #if row['SKU'] not in cache:
        if ch_row['ProductCode'] == row[' Stock']:
            Stk_Lvl = int(ch_row[stk_lvl_header])
            if Stk_Lvl > 0:
                res = 3746
            elif Stk_Lvl == 0:
                res = 3745
            else:
                res = " "
            found = True
            print ch_row
            print res

            cache[row['SKU']] = res
    if not found:
        res = " "
        #print ch_row
        #print res
        cache[row['SKU']] = res     
    row['ChannelProfileID'] = cache[row['SKU']]
    writer.writerow(row)


This is a few lines from my in_file and also the outfile is the same structure. It just updates the ChannelProfileID depending on the results found.

`"StockNumber","SKU","ChannelProfileID"
"10m_s-vid#APTIIAMZ","2VV-10",3746
"10m_s-vid#CSE","2VV-10",3746
"1RR-01#CSE","1RR-01",374

Solution

What you want is a mapping from the product code (the key) to a stock level/result code (the value). In Python this is known as a dictionary. The way to do it is to go through your check file at the start, and use the information in it to create a dictionary containing all the stock level details. You then go through your input file, read in the product code, and retrieve the stock code from the dictionary you created earlier.

I've rewritten your code to do this, and it works for the example files you gave. I have commented it fairly thoroughly, but if there is anything unclear in it just post a comment and I'll try to clarify.

import csv

# Open the check file in a context manager. This ensures the file will be closed
# correctly if an error occurs.
with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    # Create a function which maps the stock level to the result code.
    def result_code(stock_level):
        if stock_level > 0:
            return 3746
        if stock_level == 0:
            return 3745
        return " "

    # This does the real work. The middle line is a generator expression which
    # iterates over each line in the check file. The product code and stock
    # level are extracted from each line, the stock level converted into the
    # result, and the two values put together in a tuple. This is then converted
    # into a dictionary. This dictionary has the product codes as its keys and
    # their result code as its values.
    product_result = dict(
        (v['ProductCode'], result_code(int(v[' Stock']))) for v in checkreader
    )

# Open the input and output files.
with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)

        # Use the same field names for the output file.
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        # Iterate over the products in the input.
        for product in reader:
            # Find the stock level from the dictionary we created earlier. Using
            # the get() method allows us to specify a default value if the SKU
            # does not exist in the dictionary.
            result = product_result.get(product['SKU'], " ")

            # Update the product info.
            product['ChannelProfileID'] = result

            # Write it to the output file.
            writer.writerow(product)

Code Snippets

import csv

# Open the check file in a context manager. This ensures the file will be closed
# correctly if an error occurs.
with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    # Create a function which maps the stock level to the result code.
    def result_code(stock_level):
        if stock_level > 0:
            return 3746
        if stock_level == 0:
            return 3745
        return " "

    # This does the real work. The middle line is a generator expression which
    # iterates over each line in the check file. The product code and stock
    # level are extracted from each line, the stock level converted into the
    # result, and the two values put together in a tuple. This is then converted
    # into a dictionary. This dictionary has the product codes as its keys and
    # their result code as its values.
    product_result = dict(
        (v['ProductCode'], result_code(int(v[' Stock']))) for v in checkreader
    )

# Open the input and output files.
with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)

        # Use the same field names for the output file.
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        # Iterate over the products in the input.
        for product in reader:
            # Find the stock level from the dictionary we created earlier. Using
            # the get() method allows us to specify a default value if the SKU
            # does not exist in the dictionary.
            result = product_result.get(product['SKU'], " ")

            # Update the product info.
            product['ChannelProfileID'] = result

            # Write it to the output file.
            writer.writerow(product)

Context

StackExchange Code Review Q#7113, answer score: 2

Revisions (0)

No revisions yet.