patternpythonMinor
Reading columns and rows in a .csv file
Viewed 0 times
rowsreadingfilecolumnscsvand
Problem
I have some data in a .csv file, which looks roughly like this:
And the problem is this - I need to use this data (the three replicates) in several different manners:
The data files all have the same columns, but the rows (i.e. number of fragments/peptides/genes) vary, so I have to read the data without specifying row numbers. What I need, essentially, is statistics (coefficients of variation) across each row, across each fragment and across each gene.
The variant across rows just uses the three replicates (always three values from one row), and is of course very simple to get to. Both the variants across fragments and across genes first calculates statistics for using first statistics from every applicable
I have a script that does this, almost, but it's very long and (I think) overly complicated. I basically read the file three times, each time gathering the data in the different manners described, mostly in lists and sometimes
```
with open('Data/MS - PrEST + Sample/' + data_file,'rU') as in_file:
reader = csv.reader(in_file,delimiter=';')
x = -1
data = numpy.array(['PrEST ID','Genes
[fragment1, peptide1, gene1, replicate1, replicate2, replicate3]
[fragment1, peptide2, gene1, replicate1, replicate2, replicate3]
[fragment2, peptide1, gene2, replicate1, replicate2, replicate3]
[fragment2, peptide2, gene2, replicate1, replicate2, replicate3]
[fragment3, peptide1, gene2, replicate1, replicate2, replicate3]And the problem is this - I need to use this data (the three replicates) in several different manners:
- Over each row (i.e. just
replicate1-3 for each row)
- Over each replicate column for each fragment (i.e.
replicate1frompeptides1and 2 fromfragment1, and the same forreplicate2and 3)
- Over each replicate column for each gene (i.e. same as (2), but using genes instead of fragments
The data files all have the same columns, but the rows (i.e. number of fragments/peptides/genes) vary, so I have to read the data without specifying row numbers. What I need, essentially, is statistics (coefficients of variation) across each row, across each fragment and across each gene.
The variant across rows just uses the three replicates (always three values from one row), and is of course very simple to get to. Both the variants across fragments and across genes first calculates statistics for using first statistics from every applicable
replicate1, then every replicate2, then replicate3, (i.e. unknown number of values from unknown number of rows) and after that do the same statistics using the values previously calculated (i.e. always three values). I have a script that does this, almost, but it's very long and (I think) overly complicated. I basically read the file three times, each time gathering the data in the different manners described, mostly in lists and sometimes
numpy.arrays. ```
with open('Data/MS - PrEST + Sample/' + data_file,'rU') as in_file:
reader = csv.reader(in_file,delimiter=';')
x = -1
data = numpy.array(['PrEST ID','Genes
Solution
First off, if you want re-usability, you should probably encapsulate this into a function with it's specific arguments.
Also, the general style for naming is
You have an
These
Finally, as mentioned by @DSM, you can use
Also, the general style for naming is
snake_case for functions and variables, and PascalCase for classes. You also have some other style issues. For example, (PrEST,Gene,End_Copy_Number,CV) should be changed to (PrEST, Gene, End_Copy_Number, CV). You also have various other style violations as well. To correct these, see PEP8, Python's official style guide.You have an
if/elif/else block with many continues. for n in range(len(row)):
if row[n] == 'PrEST ID':
PrEST_column = n
continue
...These
continues can be removed.Finally, as mentioned by @DSM, you can use
pandas to re-write this.Code Snippets
for n in range(len(row)):
if row[n] == 'PrEST ID':
PrEST_column = n
continue
...Context
StackExchange Code Review Q#38646, answer score: 3
Revisions (0)
No revisions yet.