patternpythonMinor
Read CSV with 3 columns and group some elements
Viewed 0 times
columnsreadwithgroupcsvelementssomeand
Problem
I have a csv with
E.g. If the csv had four entries
and the keyword is
Here's what I came up with. Is there a better way?
3 columns (date, name, number) and it is about 20K rows long. I want to create a dictionary keyed by date whose value is a dictionary of name:number for that date. On top of that, I want add some elements together if the name contains a key word so they would be listed as keyword:sum of numbers, rather than their individual entries. E.g. If the csv had four entries
6/17/84, Blackcat, 10,
6/17/84, Dog, 20,
6/17/84, Tabbycat, 12,
6/17/84, Lizard, 5and the keyword is
cat, the result should be {6/17/84: {'Dog':20, 'Lizard':5, 'cat':22}}Here's what I came up with. Is there a better way?
import csv
import operator
import collections
import time
def dict_of_csv(file_name, group_labels_with):
complete_dict = {}
key_word = [x.lower() for x in group_labels_with]
for i in file_name:
key = i[1].lower()
key_value = int(i[2])
row_date = time.strptime(i[0], "%m/%d/%y")
if row_date not in complete_dict:
complete_dict[row_date] = {}
for name in key_word:
complete_dict[row_date][name] = 0
if any(name in key for name in key_word):
for name in key_word:
if name in key:
key = name
complete_dict[row_date][key] += key_value
else:
complete_dict[row_date][key] = key_value
return complete_dict
weeks = open("../DataIn/Master List 14day.csv", "rU")
real_labels = csv.reader(weeks)
print dict_of_csv(real_labels, ['keyword1', "keyword2", "keyword3"])
weeks.close()Solution
The first argument to your function does not appear to be a string holding the name of a file, as its name lead me to initially assume. I would suggest you change to e.g.
You can then apply whatever processing you need and it's still clear what's happening:
I see what you're trying to do here:
But won't save any time - you have to go through all
On a logical point, what should happen if a
csv_reader (right?) to make it clear what we're getting.i is another bad variable name, it's usually used for an integer index, and doesn't tell the reader anything useful. Also, the following is more meaningful than the current indices into i:for date, key, val in csv_reader:You can then apply whatever processing you need and it's still clear what's happening:
val = int(val)I see what you're trying to do here:
if any(name in key for name in key_word):
for name in key_word:
if name in key:
key = nameBut won't save any time - you have to go through all
name in key_word once for the any(...) is False case, and fully twice in the worst any(...) is True case. Just use:for name in key_word:
if name in key:
key = namecollections.defaultdict would simplify much of that code for you, for example:complete_dict = defaultdict(lambda: defaultdict(int))
...
for date, key, val in csv_reader:
...
date = time.strptime(date, "%m/%d/%y")
for name in key_word:
...
complete_dict[date][key] += val # don't need any checks for keysOn a logical point, what should happen if a
key contains more than one key_word, or if there are overlaps between key_words? At the moment the first match is used in cases where there are no further complications, but you could break to ensure this (and speed up the code).Code Snippets
for date, key, val in csv_reader:val = int(val)if any(name in key for name in key_word):
for name in key_word:
if name in key:
key = namefor name in key_word:
if name in key:
key = namecomplete_dict = defaultdict(lambda: defaultdict(int))
...
for date, key, val in csv_reader:
...
date = time.strptime(date, "%m/%d/%y")
for name in key_word:
...
complete_dict[date][key] += val # don't need any checks for keysContext
StackExchange Code Review Q#63084, answer score: 3
Revisions (0)
No revisions yet.