HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Portable Python CSV class

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
csvclasspythonportable

Problem

I have been working on a project where I needed to analyze multiple, large datasets contained inside many CSV files at the same time. I am not a programmer but an engineer, so I did a lot of searching and reading. Python's stock CSV module provides the basic functionality, but I had a lot of trouble getting the methods to run quickly on 50k-500k rows since many strategies were simply appending. I had lots of problems getting what I wanted and I saw the same questions asked over and over again. I decided to spend some time and write a class that performed these functions and would be portable. If nothing else, myself and other people I work with could use it.

I would like some input on the class and any suggestions you may have. I am not a programmer and don't have any formal background so this has been a good OOP intro for me. The end result is in two lines you can read all CSV files in a folder into memory as either pure Python lists or, as lists of NumPy arrays. I have tested it in many scenarios and hopefully found most of the bugs. I'd like to think this is good enough that other people can just copy and paste into their code and move on to the more important stuff. I am open to all critiques and suggestions. Is this something you could use? If not, why?

You can try it with generic CSV data. The standard Python lists are flexible in size and data type. NumPy will only work with numeric (float specifically) data that is rectangular in format:

```
x, y, z,
1, 2, 3,
4, 5, 6,
...

import numpy as np
import csv
import os
import sys

class EasyCSV(object):
"""Easily open from and save CSV files using lists or numpy arrays.

Initiating and using the class is as easy as CSV = EasyCSV('location').
The class takes the following arguements:

EasyCSV(location, width=None, np_array='false', skip_rows=0)

location is the only mandatory field and is string of the folder location
containing .CSV file(s).

width is optional and specifies a constant

Solution

Some observations:

  • You expect read to be called exactly once (otherwise it reads the same files again, right?). You might as well call it from __init__ directly. Alternatively, read could take location as parameter, so one could read multiple directories into the object.



  • You use strings 'true', 'false' where you should use actual bool values True, False



  • You set instance variables such as self.key = key that you use only locally inside the function, where you could simply use the local variable key.



  • The read method is very long. Divide the work into smaller functions and call them from read.



  • You have docstrings and a fair amount of comments, good. But then you have really cryptic statements such as self.i = 0.



  • Some variable names are misleading, such as files which is actually a single filename.



  • Don't change the working directory (os.chdir). Use os.path.join(loc, filename) to construct paths. (If you think it's OK to change it, think what happens if you combine this module with some other module that also thinks it's OK)

Context

StackExchange Code Review Q#24836, answer score: 6

Revisions (0)

No revisions yet.