HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Predict new ratings for each user based on their pearson correlation with other users

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
ratingsneweachwithuseruserspredictpearsonforbased

Problem

I am new to R and programming. I have a set of ratings for 45000 users and 40 odd movies. I need to predict new ratings for each user based on their pearson correlation with other users. I also need to store the set of similar users for each user-movie combination.The code that I have managed to write is this

# Matrix of users and ratings
x = lower & cor_mat[i,] < upper)

final_x[i,j] = t(x[sim_user,j]) %*%  
      cor_mat[sim_user,j]/sum(cor_mat[sim_user,j])

df[[length(df)+1]] = cbind.data.frame(i,j,sim_user,cor_mat[sim_user,j])

 }
}


Questions:

  • I am looping over each element of the matrix which works fine but seems pretty novice to me. Can something better be done?



  • I have heard of the foreach package but read that it adds value only when a single operation takes a long time to execute which is not the case here. Will it still provide me good performance?

Solution

You have incorrect indexing in the loop. Better version with mapply and correct indexing:

df = lower & cor_mat[i,] < upper)
  final_x[j, i] = t(x[j, sim_user]) %*% (cor_mat[j, sim_user]/sum(cor_mat[j, sim_user]))
  cbind(i, j, sim_user, cor_mat[j, sim_user])
}, 1:ncol(x), 1:nrow(x))


This version already 44 times faster than the loop version:

test replications elapsed relative user.self sys.self user.child sys.child
1   loop          100    2.21     44.2      2.21        0         NA        NA
2 mapply          100    0.05      1.0      0.05        0         NA        NA

Code Snippets

df <- mapply(function(i, j) {
  sim_user = which(cor_mat[i,] >= lower & cor_mat[i,] < upper)
  final_x[j, i] = t(x[j, sim_user]) %*% (cor_mat[j, sim_user]/sum(cor_mat[j, sim_user]))
  cbind(i, j, sim_user, cor_mat[j, sim_user])
}, 1:ncol(x), 1:nrow(x))
test replications elapsed relative user.self sys.self user.child sys.child
1   loop          100    2.21     44.2      2.21        0         NA        NA
2 mapply          100    0.05      1.0      0.05        0         NA        NA

Context

StackExchange Code Review Q#133225, answer score: 3

Revisions (0)

No revisions yet.