patternpythonMinor
CSV parsing program that creates distinct header rows with transaction rows underneath
Viewed 0 times
distinctheaderrowscreateswithcsvprogramparsingthattransaction
Problem
My code reads in the data using
I'm looking to possibly simplify or streamline this code, and then could use advice on how to implement robust error-handling throughout. Here is my program:
```
#!/usr/bin/python
# pre_process.py
import csv
import sys
def main():
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile, 'rbU') as in_obj:
reader, fieldnames = open_reader(in_obj)
reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'],
key['DON_DATE']))
header_list = create_header_list(reader)
master_dict = mapData(header_list, reader)
writeData(master_dict, outfile, fieldnames)
def open_reader(file_obj):
reader = csv.DictReader(file_obj, delimiter=',')
return reader, reader.fieldnames
def create_header_list(dict_obj):
p_id_list = []
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_list
def mapData(header_list, dict_obj):
master_dict = {}
client_section_list = []
for element in header_list:
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) == element:
client_section_list.append(row)
element = list(element)
element_list = [client_section_list[0]['DEDUCT_AMT'],
client_section_list[0]['ND_AMT'],
client_section_list[0]['DEDUCT_YTD'],
client_section_list[0]['NONDEDUCT_YTD']
]
try:
element_list.append((float(client_section_list[0]['DEDUCT_YTD']) +
float(client
DictReader, then creates a header row that contains my composite key (PEOPLE_ID, DON_DATE), and then adds various values that are distinct to each section. The output looks like this:-01- PEOPLE_ID, DON_DATE, etc...
-02- dataline
-02- dataline
-01- ...
etc...I'm looking to possibly simplify or streamline this code, and then could use advice on how to implement robust error-handling throughout. Here is my program:
```
#!/usr/bin/python
# pre_process.py
import csv
import sys
def main():
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile, 'rbU') as in_obj:
reader, fieldnames = open_reader(in_obj)
reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'],
key['DON_DATE']))
header_list = create_header_list(reader)
master_dict = mapData(header_list, reader)
writeData(master_dict, outfile, fieldnames)
def open_reader(file_obj):
reader = csv.DictReader(file_obj, delimiter=',')
return reader, reader.fieldnames
def create_header_list(dict_obj):
p_id_list = []
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_list
def mapData(header_list, dict_obj):
master_dict = {}
client_section_list = []
for element in header_list:
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) == element:
client_section_list.append(row)
element = list(element)
element_list = [client_section_list[0]['DEDUCT_AMT'],
client_section_list[0]['ND_AMT'],
client_section_list[0]['DEDUCT_YTD'],
client_section_list[0]['NONDEDUCT_YTD']
]
try:
element_list.append((float(client_section_list[0]['DEDUCT_YTD']) +
float(client
Solution
Don't reuse names for multiple purposes
Before this line,
after this line it's a
This can be confusing. It would be better to name the result something else.
And it gets worse: this new reader
which further adds to the confusion.
Simplify set creation
This function essentially creates a set:
The
It would be simpler and more efficient to use a
Or even:
If the ordering of the elements is important,
then instead of a
as suggested by this post.
Running Python scripts
Not all systems have Python at
Follow PEP8
PEP8 is the coding style guide for Python.
Among other things,
it recommends using
Several functions violate that.
Even if you disagree with a specific naming convention,
it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as
Before this line,
reader is a DictReader,after this line it's a
list:reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'],
key['DON_DATE']))This can be confusing. It would be better to name the result something else.
And it gets worse: this new reader
reader is passed to create_header_list and mapData as parameter named "dict_obj",which further adds to the confusion.
Simplify set creation
This function essentially creates a set:
def create_header_list(dict_obj):
p_id_list = []
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_listThe
not in check is inefficient, because it's an \$O(n)\$ operation.It would be simpler and more efficient to use a
set:def create_header_list(dict_obj):
p_id_set = set()
for row in dict_obj:
p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_setOr even:
def create_header_list(dict_obj):
return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])If the ordering of the elements is important,
then instead of a
set, you can use an OrderedDict,as suggested by this post.
Running Python scripts
Not all systems have Python at
/use/bin/python. The recommended shebang for Python scripts:#!/usr/bin/env pythonFollow PEP8
PEP8 is the coding style guide for Python.
Among other things,
it recommends using
snake_case for variable and function names.Several functions violate that.
Even if you disagree with a specific naming convention,
it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as
create_header_list and mapData.Code Snippets
reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'],
key['DON_DATE']))def create_header_list(dict_obj):
p_id_list = []
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_listdef create_header_list(dict_obj):
p_id_set = set()
for row in dict_obj:
p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_setdef create_header_list(dict_obj):
return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])#!/usr/bin/env pythonContext
StackExchange Code Review Q#101323, answer score: 3
Revisions (0)
No revisions yet.