patternpythonMinor
Python data massager
Viewed 0 times
datapythonmassager
Problem
Below is a pretty simple Python script to ingest some data, massage it as necessary, append a column, sort it, and then write it back to another file. Still learning all the Python best practices and APIs, so as always, any tips and tricks are appreciated! Feel free to be brutal with me if it's horrible. It's my first time trying any kind of object-orientation with the language as well, so I have no idea if I'm massacring that or not.
```
from collections import OrderedDict
import sys
class SkuSequence:
def __init__(self, iln, ili, sn, psn, ssd, ssc, idc):
self.inv_loc_nm = iln.strip()
self.inv_loc_id = ili
self.sku_nbr = sn
self.prt_seq_nbr = psn
self.sign_sz_desc = ssd
self.sign_sz_cd = ssc
self.inv_disp_cd = idc.strip()
self.mod_seq_nbr = "null"
def get_str_attrs(self):
attrs = []
attrs.append(str(self.inv_loc_nm))
attrs.append(str(self.inv_loc_id))
attrs.append(str(self.sku_nbr))
attrs.append(str(self.prt_seq_nbr))
attrs.append(str(self.sign_sz_desc))
attrs.append(str(self.sign_sz_cd))
attrs.append(str(self.inv_disp_cd))
attrs.append(str(self.mod_seq_nbr))
return attrs
def sort_sequence(sequence):
dict = {}
for sku in sequence:
if sku.inv_loc_id not in dict:
dict[sku.inv_loc_id] = {}
dict[sku.inv_loc_id][sku.prt_seq_nbr] = sku
for bay in dict:
dict[bay] = OrderedDict(sorted(dict[bay].items()))
return OrderedDict(sorted(dict.items()))
def convert_to_csv(header, data):
rows = []
if header is not None:
rows.append(",".join(header.get_str_attrs()))
for bay, sequence_data in data.items():
seq_nbr = 0
for sku in sequence_data.values():
if int(sku.inv_disp_cd) == 5:
sku.mod_seq_nbr = "null"
else:
seq_nbr += 1
sku.mod_seq_nbr = seq_nbr
rows.app
```
from collections import OrderedDict
import sys
class SkuSequence:
def __init__(self, iln, ili, sn, psn, ssd, ssc, idc):
self.inv_loc_nm = iln.strip()
self.inv_loc_id = ili
self.sku_nbr = sn
self.prt_seq_nbr = psn
self.sign_sz_desc = ssd
self.sign_sz_cd = ssc
self.inv_disp_cd = idc.strip()
self.mod_seq_nbr = "null"
def get_str_attrs(self):
attrs = []
attrs.append(str(self.inv_loc_nm))
attrs.append(str(self.inv_loc_id))
attrs.append(str(self.sku_nbr))
attrs.append(str(self.prt_seq_nbr))
attrs.append(str(self.sign_sz_desc))
attrs.append(str(self.sign_sz_cd))
attrs.append(str(self.inv_disp_cd))
attrs.append(str(self.mod_seq_nbr))
return attrs
def sort_sequence(sequence):
dict = {}
for sku in sequence:
if sku.inv_loc_id not in dict:
dict[sku.inv_loc_id] = {}
dict[sku.inv_loc_id][sku.prt_seq_nbr] = sku
for bay in dict:
dict[bay] = OrderedDict(sorted(dict[bay].items()))
return OrderedDict(sorted(dict.items()))
def convert_to_csv(header, data):
rows = []
if header is not None:
rows.append(",".join(header.get_str_attrs()))
for bay, sequence_data in data.items():
seq_nbr = 0
for sku in sequence_data.values():
if int(sku.inv_disp_cd) == 5:
sku.mod_seq_nbr = "null"
else:
seq_nbr += 1
sku.mod_seq_nbr = seq_nbr
rows.app
Solution
To summarize, the things I expect could be improved include documentation (docstrings and/or comments), variable names (especially the ones that don't convey much, or convey things that don't match behavior), the coupling of functions (
The first things I dwell on as a reader are the names. What in the world is
The use of
The interface or name of
Alternately
Referring directly to
write_file may misbehave if sys.argv[1] is wrong, or if its input isn't sorted), and a couple of places that using some specialized collections could clean things up.The first things I dwell on as a reader are the names. What in the world is
iln vs ili, etc. Is this a foreign language? An acronym in heavy usage in your field? If neither, it's generally considered more pythonic to use longer with more obvious meanings. (Note that even the longer attribute names these get on SkuSequence instances are a little abbreviated for my taste.) This can be mitigated with comments or docstrings, but I prefer to have the code stand on its own as much as possible.The use of
OrderedDict in sort_sequence seems fishy to me. Typically OrderedDict is used to remember the insertion order, but here you are sorting the items to create essentially a SortOrderedDict. What utility does this provide that a regular dict does not? Perhaps this should be documented in sort_sequence's docstring, or perhaps the sorting should be handled in convert_to_csv. By the way, the dance to ensure that dict[sku.inv_loc_id] exists would be better written by using a collections.defaultdict(dict) and just pretending it's already there. It also shouldn't be called dict as that shadows the builtin of the same name.The interface or name of
consume_data seems a little off. Calling it consume_data suggests that it could take the output of csv.readlines() instead of taking the name of the file that it then reads. Furthermore, if the file is truly csv, you're better off using a real csv parser such as that in the csv module, especially if any values could ever contain a comma. Finally, unless you are discarding some of the values, the creation of SkuSequence could take SkuSequence(*attrs) or even SkuSequence(line.split(",")).Alternately
SkuSequence could be rewritten to accept a list or tuple, or perhaps even a collections.namedtuple. Using a namedtuple might clean up some code I don't particularly like - namely the very repetitive parts of __init__ and get_str_attrs. My gut tells me it is worth exploring using namedtuple more closely, but I haven't proven it yet.Referring directly to
sys.argv[1] in write_file also seems a little unusual to me; it tightly couples it to the script rather than letting it be used in library, or even with a better command-line parser. Note that by contrast consume_data accepts the name of the file, but write_file magically divines it. It doesn't bother me as much that write_file modifies the name by adding _resequence (although it should probably use helpers from os.path for dicing and splicing the filename); make sure the parameter (that it doesn't yet take reflects) this in its name, say source_file or base_name. Coupling would probably be even lower (better) if main decided the name (even if by calling a new helper function); as this would allow the script to function "in place" if ever directed to.Context
StackExchange Code Review Q#39312, answer score: 2
Revisions (0)
No revisions yet.