patternpythonMinor
Temperature Interpolation
Viewed 0 times
interpolationtemperaturestackoverflow
Problem
I want to interpolate temperature in correlation with height. I have temperature data from stations with coordinates and height in this format:
After reading the data there are 4 arrays (
Z is calculated like this:
After that I can calculate temperature on 0 meters for each station:
I also have data with heights in meters with resolution of 0.003333333333 degrees and I want to use these heights and interpolated temperature to create map.
Heights are in
So this is what I've done so far, everything is working but it's very slow:
`if __name__ == '__main__':
nx, ny = 1080,624
chunkSize = 10
t = []
grid2 = []
xEast = 12
ySouth = 42
xWest = 21
yNorth = 47.2
res = 0.003333333333
x,y,temp0m,z = downloadData()
xi = np.linspace(xEast+0.1, xWest-0.1, nx)
yi = np.linspace(ySouth+0.1, yNorth-0.1, ny)
xi, yi = np.meshgrid(xi, yi)
xi, yi = xi.flatten(), yi.flatten()
l = len(xi)/chunkSize
for i in range(chunkSize):
interp = rbfInterp(x,y,temp0m,xi[(li):(l(i+1))],yi[(li):(l(i+1))])
grid2 = grid2 + list(interpData(xi[(li):(l(i+1))],yi[(li):(l(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(li):(l(i+1))]
grid2 = np.array(grid2, np.float)
grid2 = grid2.reshape((ny, nx))
plot (x,y,temp0m,grid2) #plotting to png with matplotlib
def rbfInterp(x, y, temp0m,xi,yi):
interp = Rbf(x, y, temp0m, functi
Lat Lon Temp Height
46.2956 16.3978 10.71196 120
45.9776 16.0159 11.78539 150
45.8521 15.9598 11.18838 133
etc.After reading the data there are 4 arrays (
y, x, temp, and height) and variable z which represents correlation between temperature and height, e.g., z = -0.005 which means that for each meter of height temperature decreases for 0.005 °C.Z is calculated like this:
z = np.polyfit(height, temp, 1,2)After that I can calculate temperature on 0 meters for each station:
for i in range(len(temp)):
temp0m.append((temp[i])-(height[i]*z))I also have data with heights in meters with resolution of 0.003333333333 degrees and I want to use these heights and interpolated temperature to create map.
Heights are in
"dem_final2.txt" (ncolums = 2700, nrows = 1560)1653 1571 1493 1429 1354...
1730 1699 1620 1528 1399...So this is what I've done so far, everything is working but it's very slow:
`if __name__ == '__main__':
nx, ny = 1080,624
chunkSize = 10
t = []
grid2 = []
xEast = 12
ySouth = 42
xWest = 21
yNorth = 47.2
res = 0.003333333333
x,y,temp0m,z = downloadData()
xi = np.linspace(xEast+0.1, xWest-0.1, nx)
yi = np.linspace(ySouth+0.1, yNorth-0.1, ny)
xi, yi = np.meshgrid(xi, yi)
xi, yi = xi.flatten(), yi.flatten()
l = len(xi)/chunkSize
for i in range(chunkSize):
interp = rbfInterp(x,y,temp0m,xi[(li):(l(i+1))],yi[(li):(l(i+1))])
grid2 = grid2 + list(interpData(xi[(li):(l(i+1))],yi[(li):(l(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(li):(l(i+1))]
grid2 = np.array(grid2, np.float)
grid2 = grid2.reshape((ny, nx))
plot (x,y,temp0m,grid2) #plotting to png with matplotlib
def rbfInterp(x, y, temp0m,xi,yi):
interp = Rbf(x, y, temp0m, functi
Solution
The problems start with this code being fairly unreadable. A few simple rules can help:
-
Every comma is followed by a space:
-
Every binary operator like
instead of
-
If an expression becomes to complicated, assign intermediate values to variables which also assist in explaining the code to a reader:
instead of
-
At least try to provide meaningful names for your variables. Expressions like
-
Avoid overly clever things like
-
In a function call, never put a space between the function name and the parens.
-
Most of Python is using the convention that variable names should use
-
Some variable names like
Once we have cleaned up the code, we can investigate where performance could be increased. This is done mostly by avoiding unnecessary recalculations, but also by finding more efficient expressions.
For example,
Inside
The whole
Some important points:
The purpose of the code is now much more obvious.
However, the usage of
-
Every comma is followed by a space:
xi, yi instead of xi,yi.-
Every binary operator like
+ is surrounded by a space on each side: int(round((xi[i] - xEast) / res, 0) + 1)instead of
int(round((xi[i]-xEast)/res,0)+1).-
If an expression becomes to complicated, assign intermediate values to variables which also assist in explaining the code to a reader:
start = l * i
end = l * (i + 1)
chunk_x = xi[start:end]
chunk_y = yi[start:end]
interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y)
result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res)
grid2 += list(result)[start:end]instead of
interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]-
At least try to provide meaningful names for your variables. Expressions like
tmp2[tmp1] are unacceptable.-
Avoid overly clever things like
append = t.append; …; append(…), as you gain nothing by this indirection. t.append(…) directly.-
In a function call, never put a space between the function name and the parens.
plot(x, y, temp0m, grid2)instead ofplot (x,y,temp0m,grid2)
append(None)instead ofappend (None)
-
Most of Python is using the convention that variable names should use
snake_case not camelCase. While consistency is more important than following any specific style, I suggest you adopt this common style.-
Some variable names like
chunkSize are misleading – this is the number of chunks, not their size.Once we have cleaned up the code, we can investigate where performance could be increased. This is done mostly by avoiding unnecessary recalculations, but also by finding more efficient expressions.
For example,
Rbf(x, y, temp0m, function='linear') will always produce the same value as the parameters never change. Therefore, calculate it once outside of any loops. Actually, you can remove the whole rbfInterp function.Inside
interpData, you translate and rescale the data: (yi[i]-ySouth)/res. If you do this outside of an explicit loop, then numpy can do this more efficiently: translated_yi = (yi - ySouth)/res.The whole
meshgrid business is obfuscating which calculation you are actually performing:# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')
# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)
# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)
grid = np.zeros((ny, nx))
for i in range(ny):
heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
for j in range(nx):
height = float(height[xis_normalized[j]])
result = height * z + interpolation_function(xis[j], yis[i])
grid[i][j] = result if data_item > -10 else np.nanSome important points:
- We fetch and parse each line once per line, instead of once per grid column.
- We factor out repeated calculations outside of the loops
- We avoid any confusing reshaping by writing the results directly into the correct data structure.
- We try to avoid explicit loops by using numpy transformations
The purpose of the code is now much more obvious.
However, the usage of
linecache is quite suspicious. We read the lines in sequential order, so caching will be of no use → read the file manually. You should also consider reading it into a numpy array, as the file in this form should only be using a few MB of memory:heights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...Code Snippets
int(round((xi[i] - xEast) / res, 0) + 1)start = l * i
end = l * (i + 1)
chunk_x = xi[start:end]
chunk_y = yi[start:end]
interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y)
result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res)
grid2 += list(result)[start:end]interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')
# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)
# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)
grid = np.zeros((ny, nx))
for i in range(ny):
heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
for j in range(nx):
height = float(height[xis_normalized[j]])
result = height * z + interpolation_function(xis[j], yis[i])
grid[i][j] = result if data_item > -10 else np.nanheights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...Context
StackExchange Code Review Q#47110, answer score: 8
Revisions (0)
No revisions yet.