patternpythonMinor
Implementation of KNN in R
Viewed 0 times
implementationknnstackoverflow
Problem
I have implemented the K-Nearest Neighbor algorithm with Euclidean distance in R. It works fine but takes tremendously huge time than the library function (get.knn). Please point out the possibility of improvement.
knn<-function(list,k){
n=nrow(list)
if (n<=k) stop("k can not be more than n-1")
neigh<- matrix(0,nrow=n,ncol=k)
for(i in 1:n){
dist<-matrix(0,ncol=2,nrow=n)
for (j in 1:n){
dist[j,1]<-j
dist[j,2]<-sum((list[i,]-list[j,])^2)
#dist[j,2]<-dtw(list[i,],list[j,])$distance
}
sorted<-dist[order(dist[,2]),]
neigh[i,]<-sorted[2:(k+1),1]
}
return(neigh)
}Solution
You can improve your code by using vectorization to speed up the computation of Euclidean distances in the inner loop. The code would be:
Notice that I also made a few changes:
The code above might still be a bit too slow because of the computations inside the
If my first suggestion is still too slow and your matrix is so large that my second solution cannot be used, you might want to use a mix between the two approaches, where you loop on chunks of rows, i.e. rely on
Finally, I will point out that if you are interested, you could search CRAN or the internet for a package that does exactly what you are after. KNN is a very common tool and there must be packages (compiled from C) that already do it much faster than this code will do. But not much to be learnt there...
knn <- function(mat, k){
n <- nrow(mat)
if (n <= k) stop("k can not be more than n-1")
neigh <- matrix(0, nrow = n, ncol = k)
for(i in 1:n) {
euc.dist <- colSums((mat[i, ] - t(mat)) ^ 2)
neigh[i, ] <- order(euc.dist)[2:(k + 1)]
}
return(neigh)
}Notice that I also made a few changes:
listis a function in R so calling your objectlistis a pretty bad idea. Also, in the R language, a "list" refers to a very specific data structure, while your code seems to be using a matrix. So calling that inputmatseemed more appropriate.
- Similarly, there is a
distfunction in R so it is a bad idea to name your variable this way. I choseeuc.distsince you were computing an euclidean distance.
orderalready returns an index, so the whole idea of column-biding your indices and distances was not necessary: you just need a vector of distances.
The code above might still be a bit too slow because of the computations inside the
for loop. If the dimensions of your matrix are not astronomical, you could compute the matrix of distances at first using a compiled function. I once tested about a dozen of packages and found that fields::rdist (written in Fortran) was the fastest. If you do not wish to install it, you can use the base::dist function. The code would look like this:knn <- function(mat, k){
n <- nrow(mat)
if (n <= k) stop("k can not be more than n-1")
neigh <- matrix(0, nrow = n, ncol = k)
library(fields)
dist.mat <- rdist(mat, mat)
for(i in 1:n) {
euc.dist <- dist.mat[i, ]
neigh[i, ] <- order(euc.dist)[2:(k + 1)]
}
return(neigh)
}If my first suggestion is still too slow and your matrix is so large that my second solution cannot be used, you might want to use a mix between the two approaches, where you loop on chunks of rows, i.e. rely on
rdist(chunk_of_rows_from_mat, mat). But I'll leave that to you as an exercise :-)Finally, I will point out that if you are interested, you could search CRAN or the internet for a package that does exactly what you are after. KNN is a very common tool and there must be packages (compiled from C) that already do it much faster than this code will do. But not much to be learnt there...
Code Snippets
knn <- function(mat, k){
n <- nrow(mat)
if (n <= k) stop("k can not be more than n-1")
neigh <- matrix(0, nrow = n, ncol = k)
for(i in 1:n) {
euc.dist <- colSums((mat[i, ] - t(mat)) ^ 2)
neigh[i, ] <- order(euc.dist)[2:(k + 1)]
}
return(neigh)
}knn <- function(mat, k){
n <- nrow(mat)
if (n <= k) stop("k can not be more than n-1")
neigh <- matrix(0, nrow = n, ncol = k)
library(fields)
dist.mat <- rdist(mat, mat)
for(i in 1:n) {
euc.dist <- dist.mat[i, ]
neigh[i, ] <- order(euc.dist)[2:(k + 1)]
}
return(neigh)
}Context
StackExchange Code Review Q#94410, answer score: 3
Revisions (0)
No revisions yet.