patternpythonMinor
Integration of data using Simpson's rule
Viewed 0 times
ruleusingintegrationdatasimpson
Problem
I wrote a Python program that performs an integration of data using Simpson's rule. The program takes the areas to be integrated from a foo.ref file, with the following syntax:
The data that the program parses is taken from all the .xy files in the current directory, with the following syntax:
The program's final output is a tab seperated file that can be opened directly in excel, the performance is ... a lot worse than I was expecting however and I was wondering if anyone could offer some tricks to increase that (and also to have a general look at the code, if it is correct Python).
```
#! /usr/bin/env python
import glob
import os
import fileinput
print " a script to quickly calculate the area for a set"
print "of analytes. Written by Bas Jansen, Leiden University Medical"
print "Center, March 20th 2013."
print ""
# Simpson's rule
def integrate(y_vals,h):
i=1
total=y_vals[0]+y_vals[-1]
for y in y_vals[1:-1]:
if i%2 == 0:
total+=2*y
else:
total+=4*y
i+=1
return total*(h/3.0)
# Reads the reference file
ref=[]
for refs in glob.glob("*.ref"):
fr=open(refs)
for line in fr.readlines():
parts=line.rstrip('\n').split('\t')
ref.append(parts)
fr.close()
# Initialize the output file
f=open('output.xls','w')
for i in range(1,len(ref)):
f.write(ref[i][0]+"\n")
f.close()
# Find XY files in current directory
xy_files=[]
for files in glob.glob("*.xy"):
xy_files.append(files)
x=[]
area=[]
print "Parsing: "+files
fh=open(files)
for line in fh.readlines():
y=[float(i) for i in line.split()]
x.append(y)
fh.close()
# Pick the XY window
print "Progress indicator:"
prin
# peak m/z charge window
LAG_H4N4F1S1 4244.829467 1+ 0.06
LAG_H4N4F1S1 4245.832508 1+ 0.06
LAG_H4N4F1S1 4246.835549 1+ 0.06
LAG_H4N4F1S1 4247.838590 1+ 0.06The data that the program parses is taken from all the .xy files in the current directory, with the following syntax:
3497.7552 0
3497.7619 410646
3497.7685 397637
3497.7752 363672
3497.7819 321723
3497.7886 286960The program's final output is a tab seperated file that can be opened directly in excel, the performance is ... a lot worse than I was expecting however and I was wondering if anyone could offer some tricks to increase that (and also to have a general look at the code, if it is correct Python).
```
#! /usr/bin/env python
import glob
import os
import fileinput
print " a script to quickly calculate the area for a set"
print "of analytes. Written by Bas Jansen, Leiden University Medical"
print "Center, March 20th 2013."
print ""
# Simpson's rule
def integrate(y_vals,h):
i=1
total=y_vals[0]+y_vals[-1]
for y in y_vals[1:-1]:
if i%2 == 0:
total+=2*y
else:
total+=4*y
i+=1
return total*(h/3.0)
# Reads the reference file
ref=[]
for refs in glob.glob("*.ref"):
fr=open(refs)
for line in fr.readlines():
parts=line.rstrip('\n').split('\t')
ref.append(parts)
fr.close()
# Initialize the output file
f=open('output.xls','w')
for i in range(1,len(ref)):
f.write(ref[i][0]+"\n")
f.close()
# Find XY files in current directory
xy_files=[]
for files in glob.glob("*.xy"):
xy_files.append(files)
x=[]
area=[]
print "Parsing: "+files
fh=open(files)
for line in fh.readlines():
y=[float(i) for i in line.split()]
x.append(y)
fh.close()
# Pick the XY window
print "Progress indicator:"
prin
Solution
#! /usr/bin/env python
import glob
import os
import fileinput
print " a script to quickly calculate the area for a set"
print "of analytes. Written by Bas Jansen, Leiden University Medical"
print "Center, March 20th 2013."
print ""
# Simpson's rule
def integrate(y_vals,h):The recommendation for python is four spaces for indentation, you appear to be using 2
i=1
total=y_vals[0]+y_vals[-1]It's also recommended to have spaces around binary operators
for y in y_vals[1:-1]:Use
for i, y in enumerate(y_vals, 1): rather then counting your own i valueif i%2 == 0:
total+=2*y
else:
total+=4*y
i+=1
return total*(h/3.0)
# Reads the reference file
ref=[]I'd recommended not using an abbreviation her
for refs in glob.glob("*.ref"):
fr=open(refs)It's recommended to use a with block:
with open(refs) as fr:
do stuff with file
file will be automatically closed hereThat ensures your file will be closed even if an exception occours
for line in fr.readlines():
parts=line.rstrip('\n').split('\t')
ref.append(parts)You shouldn't really store the header ref here.
fr.close()
# Initialize the output file
f=open('output.xls','w')
for i in range(1,len(ref)):
f.write(ref[i][0]+"\n")
f.close()
# Find XY files in current directory
xy_files=[]
for files in glob.glob("*.xy"):
xy_files.append(files)
x=[]
area=[]
print "Parsing: "+files
fh=open(files)
for line in fh.readlines():Actually this is the same as
for line in fh:y=[float(i) for i in line.split()]
x.append(y)
fh.close()
# Pick the XY window
print "Progress indicator:"
print "0%"
for i in range(1,len(ref)):Whenever possible, don't iterate over indexes. Instead iterate over the lists. So something like
for line in ref[1:]: If you really need the indexes us enumerateobs=[]
vals=0
low=float(ref[i][1])-float(ref[i][3])
high=float(ref[i][1])+float(ref[i][3])
# Progress indicator
if i %(len(ref)/10) == 0:
print str(i/(len(ref)/10))+"0%"Why divide by 10 just to add the 0 back?
for j in range(0,len(x)):Use
for j in x: and then avoid the indexingif x[j][0] > low and x[j][0] < high:
coords=[float(l) for l in x[j]]
obs.append(coords)
vals+=1What's this for?
if obs:
y_vals=[]
h=(obs[len(obs)-1][0]-obs[0][0])/len(obs)
for k in range(0,len(obs)):
y_vals.append(obs[k][1])
area.append(integrate(y_vals,h))
# Neat trick to append the area from other input files
l = 0
for line in fileinput.input('output.xls',inplace=1):If you need to iterate over multiple sources at once, use a zip.
print "%s\t%s\n" % (line.rstrip(),area[l]),
l+=1Your neat trick is what's killing you. This is designed for processing the output of some other program. Its not good for continually adding to the end of lines you have already written. What you should do is store everything in a big list and then output it at the end.
# Display the legend (& filenames) at bottom of output file
f=open('output.xls','a')
legend = "Peak"
for m in range(0,len(xy_files)):
legend+="\t"+str(xy_files[m])
legend+="\n"
f.write(legend)
f.close()
# Being polite
raw_input("Finished, press any key to exit")Code Snippets
#! /usr/bin/env python
import glob
import os
import fileinput
print "<SimpInt.py> a script to quickly calculate the area for a set"
print "of analytes. Written by Bas Jansen, Leiden University Medical"
print "Center, March 20th 2013."
print ""
# Simpson's rule
def integrate(y_vals,h):i=1
total=y_vals[0]+y_vals[-1]for y in y_vals[1:-1]:if i%2 == 0:
total+=2*y
else:
total+=4*y
i+=1
return total*(h/3.0)
# Reads the reference file
ref=[]for refs in glob.glob("*.ref"):
fr=open(refs)Context
StackExchange Code Review Q#24160, answer score: 2
Revisions (0)
No revisions yet.