patternpythonMinor
Portable Python CSV class
Viewed 0 times
csvclasspythonportable
Problem
I have been working on a project where I needed to analyze multiple, large datasets contained inside many CSV files at the same time. I am not a programmer but an engineer, so I did a lot of searching and reading. Python's stock CSV module provides the basic functionality, but I had a lot of trouble getting the methods to run quickly on 50k-500k rows since many strategies were simply appending. I had lots of problems getting what I wanted and I saw the same questions asked over and over again. I decided to spend some time and write a class that performed these functions and would be portable. If nothing else, myself and other people I work with could use it.
I would like some input on the class and any suggestions you may have. I am not a programmer and don't have any formal background so this has been a good OOP intro for me. The end result is in two lines you can read all CSV files in a folder into memory as either pure Python lists or, as lists of NumPy arrays. I have tested it in many scenarios and hopefully found most of the bugs. I'd like to think this is good enough that other people can just copy and paste into their code and move on to the more important stuff. I am open to all critiques and suggestions. Is this something you could use? If not, why?
You can try it with generic CSV data. The standard Python lists are flexible in size and data type. NumPy will only work with numeric (float specifically) data that is rectangular in format:
```
x, y, z,
1, 2, 3,
4, 5, 6,
...
import numpy as np
import csv
import os
import sys
class EasyCSV(object):
"""Easily open from and save CSV files using lists or numpy arrays.
Initiating and using the class is as easy as CSV = EasyCSV('location').
The class takes the following arguements:
EasyCSV(location, width=None, np_array='false', skip_rows=0)
location is the only mandatory field and is string of the folder location
containing .CSV file(s).
width is optional and specifies a constant
I would like some input on the class and any suggestions you may have. I am not a programmer and don't have any formal background so this has been a good OOP intro for me. The end result is in two lines you can read all CSV files in a folder into memory as either pure Python lists or, as lists of NumPy arrays. I have tested it in many scenarios and hopefully found most of the bugs. I'd like to think this is good enough that other people can just copy and paste into their code and move on to the more important stuff. I am open to all critiques and suggestions. Is this something you could use? If not, why?
You can try it with generic CSV data. The standard Python lists are flexible in size and data type. NumPy will only work with numeric (float specifically) data that is rectangular in format:
```
x, y, z,
1, 2, 3,
4, 5, 6,
...
import numpy as np
import csv
import os
import sys
class EasyCSV(object):
"""Easily open from and save CSV files using lists or numpy arrays.
Initiating and using the class is as easy as CSV = EasyCSV('location').
The class takes the following arguements:
EasyCSV(location, width=None, np_array='false', skip_rows=0)
location is the only mandatory field and is string of the folder location
containing .CSV file(s).
width is optional and specifies a constant
Solution
Some observations:
- You expect
readto be called exactly once (otherwise it reads the same files again, right?). You might as well call it from__init__directly. Alternatively,readcould takelocationas parameter, so one could read multiple directories into the object.
- You use strings
'true', 'false'where you should use actualboolvaluesTrue, False
- You set instance variables such as
self.key = keythat you use only locally inside the function, where you could simply use the local variablekey.
- The
readmethod is very long. Divide the work into smaller functions and call them fromread.
- You have docstrings and a fair amount of comments, good. But then you have really cryptic statements such as
self.i = 0.
- Some variable names are misleading, such as
fileswhich is actually a single filename.
- Don't change the working directory (
os.chdir). Useos.path.join(loc, filename)to construct paths. (If you think it's OK to change it, think what happens if you combine this module with some other module that also thinks it's OK)
Context
StackExchange Code Review Q#24836, answer score: 6
Revisions (0)
No revisions yet.