patternpythonMinor
Picking the top 5 results for each individual
Viewed 0 times
thetopeachpickingindividualforresults
Problem
For loops are the devil. I can't see a way to speed this up with
What it's doing is picking the top 5 results for each individual, based on the column value of 12 columns in the original prediction dataset.
In the imported csv file each row is a unique individual and each possible prediction class in in a column indexed from 2 to 13.
Here's an example dataset in case you wish to reproduce it (you can use line numbers or an arbitrary sequence in place of
http://wikisend.com/download/535006/2023bd19b880.csv
apply or other more speedy functions though...pred <- read.csv("predictions_from_external_multinomial_model.csv")
pred$id <- test_users$user_id
pred$first <- "a"
pred$second <- "b"
pred$third <- "c"
pred$fourth <- "d"
pred$fifth <- "e"
for(i in 1:nrow(pred)){
pred$first[i] <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0
pred$second[i] <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0
pred$third[i] <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0
pred$fourth[i] <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0
pred$fifth[i] <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0
}
data_long <- melt(pred, id.vars=("id"), measure.vars = c("first", "second", "third", "fourth", "fifth"), value.name = "country")
data_long <- data_long[order(data_long$id),]
data_long$variable <- NULLWhat it's doing is picking the top 5 results for each individual, based on the column value of 12 columns in the original prediction dataset.
In the imported csv file each row is a unique individual and each possible prediction class in in a column indexed from 2 to 13.
Here's an example dataset in case you wish to reproduce it (you can use line numbers or an arbitrary sequence in place of
test_users$user_id in the code above):http://wikisend.com/download/535006/2023bd19b880.csv
Solution
This is a solution using
data.table and without loops.require(data.table)
pred <- fread("/home/djhurio/Downloads/2023bd19b880.csv")
pred[, id := 1:.N]
# Melt to long format
data_long <- melt(pred, id.vars = "id", measure.vars = names(pred)[2:13],
variable.name = "country")
# Order data by id (A) and value (D)
setorderv(data_long, c("id", "value"), order = c(1, -1))
# Numerate records for each id
data_long[, i := 1:.N, by = id]
# Select top 5
data_long <- data_long[i < 6, .(id, country)]
data_longCode Snippets
require(data.table)
pred <- fread("/home/djhurio/Downloads/2023bd19b880.csv")
pred[, id := 1:.N]
# Melt to long format
data_long <- melt(pred, id.vars = "id", measure.vars = names(pred)[2:13],
variable.name = "country")
# Order data by id (A) and value (D)
setorderv(data_long, c("id", "value"), order = c(1, -1))
# Numerate records for each id
data_long[, i := 1:.N, by = id]
# Select top 5
data_long <- data_long[i < 6, .(id, country)]
data_longContext
StackExchange Code Review Q#116853, answer score: 3
Revisions (0)
No revisions yet.