HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Temperature Interpolation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
interpolationtemperaturestackoverflow

Problem

I want to interpolate temperature in correlation with height. I have temperature data from stations with coordinates and height in this format:

Lat     Lon     Temp     Height

46.2956 16.3978 10.71196 120

45.9776 16.0159 11.78539 150

45.8521 15.9598 11.18838 133

etc.


After reading the data there are 4 arrays (y, x, temp, and height) and variable z which represents correlation between temperature and height, e.g., z = -0.005 which means that for each meter of height temperature decreases for 0.005 °C.

Z is calculated like this: z = np.polyfit(height, temp, 1,2)

After that I can calculate temperature on 0 meters for each station:

for i in range(len(temp)):
            temp0m.append((temp[i])-(height[i]*z))


I also have data with heights in meters with resolution of 0.003333333333 degrees and I want to use these heights and interpolated temperature to create map.

Heights are in "dem_final2.txt" (ncolums = 2700, nrows = 1560)

1653 1571 1493 1429 1354...
1730 1699 1620 1528 1399...


So this is what I've done so far, everything is working but it's very slow:

`if __name__ == '__main__':
nx, ny = 1080,624
chunkSize = 10
t = []
grid2 = []
xEast = 12
ySouth = 42
xWest = 21
yNorth = 47.2
res = 0.003333333333
x,y,temp0m,z = downloadData()

xi = np.linspace(xEast+0.1, xWest-0.1, nx)
yi = np.linspace(ySouth+0.1, yNorth-0.1, ny)
xi, yi = np.meshgrid(xi, yi)
xi, yi = xi.flatten(), yi.flatten()

l = len(xi)/chunkSize
for i in range(chunkSize):
interp = rbfInterp(x,y,temp0m,xi[(li):(l(i+1))],yi[(li):(l(i+1))])
grid2 = grid2 + list(interpData(xi[(li):(l(i+1))],yi[(li):(l(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(li):(l(i+1))]

grid2 = np.array(grid2, np.float)
grid2 = grid2.reshape((ny, nx))
plot (x,y,temp0m,grid2) #plotting to png with matplotlib

def rbfInterp(x, y, temp0m,xi,yi):
interp = Rbf(x, y, temp0m, functi

Solution

The problems start with this code being fairly unreadable. A few simple rules can help:

-
Every comma is followed by a space: xi, yi instead of xi,yi.

-
Every binary operator like + is surrounded by a space on each side:

int(round((xi[i] - xEast) / res, 0) + 1)


instead of int(round((xi[i]-xEast)/res,0)+1).

-
If an expression becomes to complicated, assign intermediate values to variables which also assist in explaining the code to a reader:

start = l * i
end = l * (i + 1)
chunk_x = xi[start:end]
chunk_y = yi[start:end]
interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y)
result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res)
grid2 += list(result)[start:end]


instead of

interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]


-
At least try to provide meaningful names for your variables. Expressions like tmp2[tmp1] are unacceptable.

-
Avoid overly clever things like append = t.append; …; append(…), as you gain nothing by this indirection. t.append(…) directly.

-
In a function call, never put a space between the function name and the parens.

  • plot(x, y, temp0m, grid2) instead of plot (x,y,temp0m,grid2)



  • append(None) instead of append (None)



-
Most of Python is using the convention that variable names should use snake_case not camelCase. While consistency is more important than following any specific style, I suggest you adopt this common style.

-
Some variable names like chunkSize are misleading – this is the number of chunks, not their size.

Once we have cleaned up the code, we can investigate where performance could be increased. This is done mostly by avoiding unnecessary recalculations, but also by finding more efficient expressions.

For example, Rbf(x, y, temp0m, function='linear') will always produce the same value as the parameters never change. Therefore, calculate it once outside of any loops. Actually, you can remove the whole rbfInterp function.

Inside interpData, you translate and rescale the data: (yi[i]-ySouth)/res. If you do this outside of an explicit loop, then numpy can do this more efficiently: translated_yi = (yi - ySouth)/res.

The whole meshgrid business is obfuscating which calculation you are actually performing:

# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')

# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)

# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)

grid = np.zeros((ny, nx))
for i in range(ny):
    heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
    for j in range(nx):
        height = float(height[xis_normalized[j]])
        result = height * z + interpolation_function(xis[j], yis[i])
        grid[i][j] = result if data_item > -10 else np.nan


Some important points:

  • We fetch and parse each line once per line, instead of once per grid column.



  • We factor out repeated calculations outside of the loops



  • We avoid any confusing reshaping by writing the results directly into the correct data structure.



  • We try to avoid explicit loops by using numpy transformations



The purpose of the code is now much more obvious.

However, the usage of linecache is quite suspicious. We read the lines in sequential order, so caching will be of no use → read the file manually. You should also consider reading it into a numpy array, as the file in this form should only be using a few MB of memory:

heights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...

Code Snippets

int(round((xi[i] - xEast) / res, 0) + 1)
start = l * i
end = l * (i + 1)
chunk_x = xi[start:end]
chunk_y = yi[start:end]
interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y)
result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res)
grid2 += list(result)[start:end]
interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]
# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')

# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)

# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)

grid = np.zeros((ny, nx))
for i in range(ny):
    heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
    for j in range(nx):
        height = float(height[xis_normalized[j]])
        result = height * z + interpolation_function(xis[j], yis[i])
        grid[i][j] = result if data_item > -10 else np.nan
heights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...

Context

StackExchange Code Review Q#47110, answer score: 8

Revisions (0)

No revisions yet.