patternpythonMinor
Reading in a series of Excel files
Viewed 0 times
seriesreadingfilesexcel
Problem
I needed to create a program which reads in a series of Excel files in folders typically structured something like this.
The number of subdirectories (aka devices) under root and number of time intervals (e.g. 0hr.xls) can change.
I read the files in, do some calculations, then write the data to an Excel sheet, and create some excel plots. I'm wondering how I can improve it and if I should be using more of an object oriented approach.
`#!/usr/local/bin/python
import pandas as pd # Pandas for data structures useful this program
import numpy as np # Numpy for various math structures and functions
from scipy import stats # Scipy statistics functions
import os, sys # Base functions for accessing data on the computer
import argparse # For parsing input arguments from the commandline
from natsort import natsorted
from xlsxwriter.utility import xl_rowcol_to_cell
class FullPaths(argparse.Action):
"""Expand user- and relative-paths"""
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, os.path.abspath(os.path.expanduser(values)))
def is_dir(dirname):
"""Checks if a path is an actual directory"""
if not os.path.isdir(dirname):
msg = "{0} is not a directory".format(dirname)
raise argparse.ArgumentTypeError(msg)
else:
return dirname
def get_args():
"""Get CLI arguments and options"""
parser = argparse.ArgumentParser()
parser.add_argument('EXPROOTPATH',
help="path to files and folders for the experiment",
action=FullPaths, type=is_dir)
parser.add_argument('--version', action='version', version='%(prog)s 1.0')
results = parser.parse_args()
return results
def read_angiotool_files():
""" Read in
rootDir
Control
0hr.xls
24hr.xls
48hr.xls
72hr.xls
0.01um
0hr.xls
24hr.xls
48hr.xls
72hr.xls
0.1um
0hr.xls
24hr.xls
48hr.xls
72hr.xls
The number of subdirectories (aka devices) under root and number of time intervals (e.g. 0hr.xls) can change.
I read the files in, do some calculations, then write the data to an Excel sheet, and create some excel plots. I'm wondering how I can improve it and if I should be using more of an object oriented approach.
`#!/usr/local/bin/python
import pandas as pd # Pandas for data structures useful this program
import numpy as np # Numpy for various math structures and functions
from scipy import stats # Scipy statistics functions
import os, sys # Base functions for accessing data on the computer
import argparse # For parsing input arguments from the commandline
from natsort import natsorted
from xlsxwriter.utility import xl_rowcol_to_cell
class FullPaths(argparse.Action):
"""Expand user- and relative-paths"""
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, os.path.abspath(os.path.expanduser(values)))
def is_dir(dirname):
"""Checks if a path is an actual directory"""
if not os.path.isdir(dirname):
msg = "{0} is not a directory".format(dirname)
raise argparse.ArgumentTypeError(msg)
else:
return dirname
def get_args():
"""Get CLI arguments and options"""
parser = argparse.ArgumentParser()
parser.add_argument('EXPROOTPATH',
help="path to files and folders for the experiment",
action=FullPaths, type=is_dir)
parser.add_argument('--version', action='version', version='%(prog)s 1.0')
results = parser.parse_args()
return results
def read_angiotool_files():
""" Read in
Solution
I think there are a few tweaks that you could benefit from here and that would make your code more extensible, organized, and clean.
What you're doing here is packaging all of your program's functionality in discrete, uncategorized functions. This is fine if your program is small and its structure uncomplicated, but with larger systems such an approach is unfeasible. It makes your program's control flow confusing, and it limits others (or yourself) in their ability to extend your program. This is because in order to modify what you've written, the programmer must understand EVERYTHING about your code - what each function does and how it works. In contrast, an object-oriented approach allows others to understand your program by understanding how the parts (classes) interact with one another without getting caught up in the details of how each class is implemented.
Here are my suggestions:
-
Divide your program into logical, self-contained units. One part of the program can be devoted to handling Excel Files, one part for handling the data contained in those files, one part for handling filesystems, one part for creating plots, etc.
-
Create classes based on these units. For example, create a generic File class in which you define general methods such as create_file, read_from_file, write_to_file, etc. and general class variables such as filepath, file_extension, file_directory, etc. that all classes share. You can then create a subclass called ExcelFile that inherits from the File class. In this class you inherit all the general file properties, so you only have to worry about implementing excel-specific methods. For example, you might have methods called apply_formula_to_column, or generate_excel_data_graph. The advantage to this design strategy is that if in the future you decide you want your program to process data from another type of file, say, for example, files of the type generated by the open office excel counterpart. Then you can create a new class called OpenOfficeFile that inherits from ExcelFile and you won't have to touch any of the code you've already written.
-
You can make similar classes for other parts of your program. I would recommend creating an abstract Data class in which you define generic data-validation functions. Then you can subclass Data for each type of data that you extract from the excel file. You can also create a class related to graphics that can handle stuff like drawing plots of the data you have.
-
Doing all this should make your main loop very simple. All such a loop would have to do is initialize an ExcelFile class, call ExcelFile.read_data(), pass that result into the init method of the Data class, and then pass that to the graphics class. Then you could call File.write() to write the results of your calculations to a file.
Hope this helps!
What you're doing here is packaging all of your program's functionality in discrete, uncategorized functions. This is fine if your program is small and its structure uncomplicated, but with larger systems such an approach is unfeasible. It makes your program's control flow confusing, and it limits others (or yourself) in their ability to extend your program. This is because in order to modify what you've written, the programmer must understand EVERYTHING about your code - what each function does and how it works. In contrast, an object-oriented approach allows others to understand your program by understanding how the parts (classes) interact with one another without getting caught up in the details of how each class is implemented.
Here are my suggestions:
-
Divide your program into logical, self-contained units. One part of the program can be devoted to handling Excel Files, one part for handling the data contained in those files, one part for handling filesystems, one part for creating plots, etc.
-
Create classes based on these units. For example, create a generic File class in which you define general methods such as create_file, read_from_file, write_to_file, etc. and general class variables such as filepath, file_extension, file_directory, etc. that all classes share. You can then create a subclass called ExcelFile that inherits from the File class. In this class you inherit all the general file properties, so you only have to worry about implementing excel-specific methods. For example, you might have methods called apply_formula_to_column, or generate_excel_data_graph. The advantage to this design strategy is that if in the future you decide you want your program to process data from another type of file, say, for example, files of the type generated by the open office excel counterpart. Then you can create a new class called OpenOfficeFile that inherits from ExcelFile and you won't have to touch any of the code you've already written.
-
You can make similar classes for other parts of your program. I would recommend creating an abstract Data class in which you define generic data-validation functions. Then you can subclass Data for each type of data that you extract from the excel file. You can also create a class related to graphics that can handle stuff like drawing plots of the data you have.
-
Doing all this should make your main loop very simple. All such a loop would have to do is initialize an ExcelFile class, call ExcelFile.read_data(), pass that result into the init method of the Data class, and then pass that to the graphics class. Then you could call File.write() to write the results of your calculations to a file.
Hope this helps!
Context
StackExchange Code Review Q#86625, answer score: 2
Revisions (0)
No revisions yet.