patternpythonMinor
Cosine similarity of one vector with many
Viewed 0 times
cosinewithonemanysimilarityvector
Problem
I'm keen to hear ideas for optimising R code to compute the cosine similarity of a vector
Values for
I'm currently using this custom Rcpp function to compute the similarity of a vector
Varying
Reproducible code below.
x (with length l) with n other vectors (stored in any structure such as a matrix m with n rows and l columns).Values for
n will typically be much larger than values for l.I'm currently using this custom Rcpp function to compute the similarity of a vector
x to each row of a matrix m:library(Rcpp)
cppFunction('NumericVector cosine_x_to_m(NumericVector x, NumericMatrix m) {
int nrows = m.nrow();
NumericVector out(nrows);
for (int i = 0; i < nrows; i++) {
NumericVector y = m(i, _);
out[i] = sum(x * y) / sqrt(sum(pow(x, 2.0)) * sum(pow(y, 2.0)));
}
return out;
}')Varying
n and l, I'm getting the following sorts of timings:Reproducible code below.
# Function to simulate data
sim_data %
mutate(timings = map2(l, n, timer))
# Plot results
results_plot %
unnest(timings) %>%
mutate(time = time / 1000000) %>% # Convert time to seconds
group_by(l, n) %>%
summarise(mean = mean(time), ci = 1.96 * sd(time) / sqrt(n()))
pd %
ggplot(aes(n, mean, group= l)) +
geom_line(aes(color = factor(l)), position = pd, size = 2) +
geom_errorbar(aes(ymin = mean - ci, ymax = mean + ci), position = pd, width = 100) +
geom_point(position = pd, size = 2) +
scale_color_brewer(palette = "Blues") +
theme_minimal() +
labs(x = "n", y = "Seconds", color = "l") +
ggtitle("Algorithm Runtime",
subtitle = "Error bars represent 95% confidence intervals")Solution
I'm using Microsoft R (with Intel MKL) which makes matrix multiplications faster, but for fair comparison I set it to be single threaded.
In my tests this pure R version
Rewriting
Initial performance:
Final version performance:
setMKLthreads(1)In my tests this pure R version
cosine_x_to_m is twice faster than yours.cosine_x_to_m2 = function(x,m){
x = x / sqrt(crossprod(x));
return( as.vector((m %*% x) / sqrt(rowSums(m^2))) );
}Rewriting
rowSums(m^2) in C/C++ makes it even faster, about four times faster than the original.library(ramwas)
cosine_x_to_m2 = function(x,m){
x = x / sqrt(crossprod(x));
return( as.vector((m %*% x) / sqrt(rowSumsSq(m))) );
}Initial performance:
Final version performance:
Code Snippets
setMKLthreads(1)cosine_x_to_m2 = function(x,m){
x = x / sqrt(crossprod(x));
return( as.vector((m %*% x) / sqrt(rowSums(m^2))) );
}library(ramwas)
cosine_x_to_m2 = function(x,m){
x = x / sqrt(crossprod(x));
return( as.vector((m %*% x) / sqrt(rowSumsSq(m))) );
}Context
StackExchange Code Review Q#159396, answer score: 4
Revisions (0)
No revisions yet.