patternpythonMinor
Optimisation of auto-correlation function
Viewed 0 times
autofunctionoptimisationcorrelation
Problem
I'm new to coding and have written a script from scratch to do an auto-correlation function. It's too slow and I have to run it on a few thousand files over the next few days. I'm sure it's terribly written (and you can see my brain ticking in basic ways through it) and can be faster.
I know that Python has an inbuilt autocorrelation function, but I think it's restricted to the sense that they use the word in physics. This is for checking residency times of a molecule in a biomolecule simulation and I couldn't see how to apply the default.
```
import sys
import time
start = time.time()
fileName = sys.argv[1]
time = []
water = []
#make two 1d arrays for time and water
with open(fileName, "r") as input:
for line in input:
time.append(line.split()[0])
water.append(line.split()[1])
# define a totaltimescale in picoseconds - the minus 1 is
# because of the zero index
# windowsize is 2 ns
totaltimescale = 50000 - 1
windowsize = 2000
# endtau should take into consideration the multiplier
# below so 400 with a multiplier of 5 gives us
endtau = 400
alltau = range(1, endtau)
for tau in alltau:
# there is a tau multiplier (i.e. use every 5th tau
# or every 5 picoseconds to speed up calculation)
tau = 5*tau
print tau
# these three values just start counters
counter = 0
noncounter = 0
output = 0
# to set up a value for total number of windows we're interested in,
# leave off 2 times the windowsize at the end arbitrarily so it doesn't
# jump too far, may get errors eventually for this so may have to be wary
numberofwindows = totaltimescale - 2*windowsize
listofwindowstartingpoints = range(0, numberofwindows)
# outer iterator which ensures the window (of size start - start + windowsize)
# is moved down one unit each time, i.e. starts at the next picosecond
for startingpoint in listofwindowstartingpoints:
window = range(startingpoint, startingpoint + windowsize)
for a in window:
i
I know that Python has an inbuilt autocorrelation function, but I think it's restricted to the sense that they use the word in physics. This is for checking residency times of a molecule in a biomolecule simulation and I couldn't see how to apply the default.
```
import sys
import time
start = time.time()
fileName = sys.argv[1]
time = []
water = []
#make two 1d arrays for time and water
with open(fileName, "r") as input:
for line in input:
time.append(line.split()[0])
water.append(line.split()[1])
# define a totaltimescale in picoseconds - the minus 1 is
# because of the zero index
# windowsize is 2 ns
totaltimescale = 50000 - 1
windowsize = 2000
# endtau should take into consideration the multiplier
# below so 400 with a multiplier of 5 gives us
endtau = 400
alltau = range(1, endtau)
for tau in alltau:
# there is a tau multiplier (i.e. use every 5th tau
# or every 5 picoseconds to speed up calculation)
tau = 5*tau
print tau
# these three values just start counters
counter = 0
noncounter = 0
output = 0
# to set up a value for total number of windows we're interested in,
# leave off 2 times the windowsize at the end arbitrarily so it doesn't
# jump too far, may get errors eventually for this so may have to be wary
numberofwindows = totaltimescale - 2*windowsize
listofwindowstartingpoints = range(0, numberofwindows)
# outer iterator which ensures the window (of size start - start + windowsize)
# is moved down one unit each time, i.e. starts at the next picosecond
for startingpoint in listofwindowstartingpoints:
window = range(startingpoint, startingpoint + windowsize)
for a in window:
i
Solution
My first suggestion is to look into parallelization. I believe your problem is one that can be made embarassingly parallel. Each iteration of your loop is independent of each other iteration, so you could run all of them simultaneously. With sufficiently beefy hardware you could get an increase in efficiency proportional to
Also, you're doing some unnecessary work in your big loop.
len(alltau) * len(window).Also, you're doing some unnecessary work in your big loop.
- You're calculating
outputfifty thousand times, when you could just calculate it exactly once after the loop ends.
- You can keep track of
totaljumpsanalysedby incrementing it once per loop. Nononcounterneeded.
- You don't necessarily need to convert
counterto a float. Remember, python supports integers of arbitrary size, so you don't need to worry about overflow.
else: passdoes nothing, so you can remove it.
totaljumpsanalysed = 0
for startingpoint in listofwindowstartingpoints:
window = range(startingpoint, startingpoint + windowsize)
for a in window:
if a + tau < totaltimescale:
if water[a] == water[a + tau]:
counter += 1
totaljumpsanalysed += 1
output = float(counter)/float(totaljumpsanalysed)Code Snippets
totaljumpsanalysed = 0
for startingpoint in listofwindowstartingpoints:
window = range(startingpoint, startingpoint + windowsize)
for a in window:
if a + tau < totaltimescale:
if water[a] == water[a + tau]:
counter += 1
totaljumpsanalysed += 1
output = float(counter)/float(totaljumpsanalysed)Context
StackExchange Code Review Q#14447, answer score: 3
Revisions (0)
No revisions yet.