patternpythonMinor
Pairwise distance and residual calculation
Viewed 0 times
residualcalculationdistanceandpairwise
Problem
I have a code for calculating the pairwise distances and the residuals of my data (X, Y, Z). Data is quite large (average of 7000 rows) and so my interest is code efficiency. My initial code is
With the
Is there a pythonic way of calculating my
import tkinter as tk
from tkinter import filedialog
import pandas as pd
import, numpy as np
from scipy.spatial.distance import pdist, squareform
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
npoints, cols = data.shape
pwdistance = np.zeros((npoints, npoints))
pwresidual = np.zeros((npoints, npoints))
for i in range(npoints):
for j in range(npoints):
pwdistance[i][j] = np.sqrt((data[:,0][i]-data[:,0][j])**2 + (data[:,1][i]-data[:,1][j])**2)
pwresidual[i][j] = (data[:,2][i]-data[:,2][j])**2With the
pwdistance, I changed it to the following below which works extremely good.pwdistance = squareform(pdist(data[:,:2]))Is there a pythonic way of calculating my
pwresidual, so I do not need to use a loop and to enable my code to run faster?Solution
Let's start by setting
to avoid some repetition. (If I knew what was in the
Now here are two possible approaches to computing
-
But in NumPy every universal function has an "outer" counterpart, and
-
Alternatively, we can use NumPy's broadcasting mechanism, combined with
col2 = data[:,2]to avoid some repetition. (If I knew what was in the
data array then maybe I could pick a better name, but all I have to go on is what was in the post.)Now here are two possible approaches to computing
pwresidual:-
pwresidual[i,j] is the result of an operation on col2[i] and col2[j]. You might recognize this as being similar to an outer product in mathematics — except that the operation is subtraction rather than multiplication.But in NumPy every universal function has an "outer" counterpart, and
np.subtract is a universal function, so all we need is:pwresidual = np.subtract.outer(col2, col2) ** 2-
Alternatively, we can use NumPy's broadcasting mechanism, combined with
np.newaxis to ensure every pair of items gets operated on:pwresidual = (col2[:,np.newaxis] - col2[np.newaxis,:]) ** 2Code Snippets
col2 = data[:,2]pwresidual = np.subtract.outer(col2, col2) ** 2pwresidual = (col2[:,np.newaxis] - col2[np.newaxis,:]) ** 2Context
StackExchange Code Review Q#152951, answer score: 4
Revisions (0)
No revisions yet.