HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

CSV parsing program that creates distinct header rows with transaction rows underneath

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
distinctheaderrowscreateswithcsvprogramparsingthattransaction

Problem

My code reads in the data using DictReader, then creates a header row that contains my composite key (PEOPLE_ID, DON_DATE), and then adds various values that are distinct to each section. The output looks like this:

-01- PEOPLE_ID, DON_DATE, etc...
-02- dataline
-02- dataline
-01- ...
etc...


I'm looking to possibly simplify or streamline this code, and then could use advice on how to implement robust error-handling throughout. Here is my program:

```
#!/usr/bin/python
# pre_process.py
import csv
import sys

def main():
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile, 'rbU') as in_obj:
reader, fieldnames = open_reader(in_obj)
reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'],
key['DON_DATE']))
header_list = create_header_list(reader)
master_dict = mapData(header_list, reader)
writeData(master_dict, outfile, fieldnames)

def open_reader(file_obj):
reader = csv.DictReader(file_obj, delimiter=',')
return reader, reader.fieldnames

def create_header_list(dict_obj):
p_id_list = []
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_list

def mapData(header_list, dict_obj):
master_dict = {}
client_section_list = []
for element in header_list:
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) == element:
client_section_list.append(row)
element = list(element)
element_list = [client_section_list[0]['DEDUCT_AMT'],
client_section_list[0]['ND_AMT'],
client_section_list[0]['DEDUCT_YTD'],
client_section_list[0]['NONDEDUCT_YTD']
]
try:
element_list.append((float(client_section_list[0]['DEDUCT_YTD']) +
float(client

Solution

Don't reuse names for multiple purposes

Before this line, reader is a DictReader,
after this line it's a list:

reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
                                             key['DON_DATE']))


This can be confusing. It would be better to name the result something else.
And it gets worse: this new reader reader is passed to create_header_list and mapData as parameter named "dict_obj",
which further adds to the confusion.

Simplify set creation

This function essentially creates a set:

def create_header_list(dict_obj):
    p_id_list = []
    for row in dict_obj:
        if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
            p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
    return p_id_list


The not in check is inefficient, because it's an \$O(n)\$ operation.

It would be simpler and more efficient to use a set:

def create_header_list(dict_obj):
    p_id_set = set()
    for row in dict_obj:
        p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
    return p_id_set


Or even:

def create_header_list(dict_obj):
    return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])


If the ordering of the elements is important,
then instead of a set, you can use an OrderedDict,
as suggested by this post.

Running Python scripts

Not all systems have Python at /use/bin/python. The recommended shebang for Python scripts:

#!/usr/bin/env python


Follow PEP8

PEP8 is the coding style guide for Python.
Among other things,
it recommends using snake_case for variable and function names.
Several functions violate that.

Even if you disagree with a specific naming convention,
it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as create_header_list and mapData.

Code Snippets

reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
                                             key['DON_DATE']))
def create_header_list(dict_obj):
    p_id_list = []
    for row in dict_obj:
        if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
            p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
    return p_id_list
def create_header_list(dict_obj):
    p_id_set = set()
    for row in dict_obj:
        p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
    return p_id_set
def create_header_list(dict_obj):
    return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])
#!/usr/bin/env python

Context

StackExchange Code Review Q#101323, answer score: 3

Revisions (0)

No revisions yet.