HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Pairwise distance and residual calculation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
residualcalculationdistanceandpairwise

Problem

I have a code for calculating the pairwise distances and the residuals of my data (X, Y, Z). Data is quite large (average of 7000 rows) and so my interest is code efficiency. My initial code is

import tkinter as tk
from tkinter import filedialog
import pandas as pd
import, numpy as np
from scipy.spatial.distance import pdist, squareform

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()

data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
npoints, cols = data.shape

pwdistance = np.zeros((npoints, npoints))
pwresidual = np.zeros((npoints, npoints))
for i in range(npoints):
    for j in range(npoints):
        pwdistance[i][j] = np.sqrt((data[:,0][i]-data[:,0][j])**2 + (data[:,1][i]-data[:,1][j])**2)
        pwresidual[i][j] = (data[:,2][i]-data[:,2][j])**2


With the pwdistance, I changed it to the following below which works extremely good.

pwdistance = squareform(pdist(data[:,:2]))


Is there a pythonic way of calculating my pwresidual, so I do not need to use a loop and to enable my code to run faster?

Solution

Let's start by setting

col2 = data[:,2]


to avoid some repetition. (If I knew what was in the data array then maybe I could pick a better name, but all I have to go on is what was in the post.)

Now here are two possible approaches to computing pwresidual:

-
pwresidual[i,j] is the result of an operation on col2[i] and col2[j]. You might recognize this as being similar to an outer product in mathematics — except that the operation is subtraction rather than multiplication.

But in NumPy every universal function has an "outer" counterpart, and np.subtract is a universal function, so all we need is:

pwresidual = np.subtract.outer(col2, col2) ** 2


-
Alternatively, we can use NumPy's broadcasting mechanism, combined with np.newaxis to ensure every pair of items gets operated on:

pwresidual = (col2[:,np.newaxis] - col2[np.newaxis,:]) ** 2

Code Snippets

col2 = data[:,2]
pwresidual = np.subtract.outer(col2, col2) ** 2
pwresidual = (col2[:,np.newaxis] - col2[np.newaxis,:]) ** 2

Context

StackExchange Code Review Q#152951, answer score: 4

Revisions (0)

No revisions yet.