patternpythonMinor

Read CSV with 3 columns and group some elements

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

columnsreadwithgroupcsvelementssomeand

Problem

I have a csv with 3 columns (date, name, number) and it is about 20K rows long. I want to create a dictionary keyed by date whose value is a dictionary of name:number for that date. On top of that, I want add some elements together if the name contains a key word so they would be listed as keyword:sum of numbers, rather than their individual entries.

E.g. If the csv had four entries

6/17/84, Blackcat, 10, 
 6/17/84, Dog, 20, 
 6/17/84, Tabbycat, 12,
 6/17/84, Lizard, 5

and the keyword is cat, the result should be

{6/17/84: {'Dog':20, 'Lizard':5, 'cat':22}}

Here's what I came up with. Is there a better way?

import csv
        import operator
        import collections
        import time

            def dict_of_csv(file_name, group_labels_with):
                complete_dict = {}
                key_word = [x.lower() for x in group_labels_with]
                for i in file_name:
                    key = i[1].lower()
                    key_value = int(i[2])
                    row_date = time.strptime(i[0], "%m/%d/%y")
                    if row_date not in complete_dict:
                        complete_dict[row_date] = {}
                        for name in key_word:
                            complete_dict[row_date][name] = 0
                    if any(name in key for name in key_word):
                        for name in key_word:
                            if name in key:
                                key = name
                        complete_dict[row_date][key] += key_value
                    else:
                        complete_dict[row_date][key] = key_value
                return complete_dict

    weeks = open("../DataIn/Master List 14day.csv", "rU")
    real_labels = csv.reader(weeks)

print dict_of_csv(real_labels, ['keyword1', "keyword2", "keyword3"])

weeks.close()

Solution

The first argument to your function does not appear to be a string holding the name of a file, as its name lead me to initially assume. I would suggest you change to e.g. csv_reader (right?) to make it clear what we're getting.

i is another bad variable name, it's usually used for an integer index, and doesn't tell the reader anything useful. Also, the following is more meaningful than the current indices into i:

for date, key, val in csv_reader:

You can then apply whatever processing you need and it's still clear what's happening:

val = int(val)

I see what you're trying to do here:

if any(name in key for name in key_word):
    for name in key_word:
        if name in key:
            key = name

But won't save any time - you have to go through all name in key_word once for the any(...) is False case, and fully twice in the worst any(...) is True case. Just use:

for name in key_word:
    if name in key:
        key = name

collections.defaultdict would simplify much of that code for you, for example:

complete_dict = defaultdict(lambda: defaultdict(int))
...
for date, key, val in csv_reader:
    ...
    date = time.strptime(date, "%m/%d/%y")
    for name in key_word:
        ...
    complete_dict[date][key] += val # don't need any checks for keys

On a logical point, what should happen if a key contains more than one key_word, or if there are overlaps between key_words? At the moment the first match is used in cases where there are no further complications, but you could break to ensure this (and speed up the code).

Code Snippets

for date, key, val in csv_reader:

val = int(val)

if any(name in key for name in key_word):
    for name in key_word:
        if name in key:
            key = name

for name in key_word:
    if name in key:
        key = name

complete_dict = defaultdict(lambda: defaultdict(int))
...
for date, key, val in csv_reader:
    ...
    date = time.strptime(date, "%m/%d/%y")
    for name in key_word:
        ...
    complete_dict[date][key] += val # don't need any checks for keys

Context

StackExchange Code Review Q#63084, answer score: 3

Revisions (0)

No revisions yet.