HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Picking the top 5 results for each individual

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thetopeachpickingindividualforresults

Problem

For loops are the devil. I can't see a way to speed this up with apply or other more speedy functions though...

pred <- read.csv("predictions_from_external_multinomial_model.csv")

pred$id <- test_users$user_id

pred$first   <- "a"
pred$second  <- "b"
pred$third   <- "c"
pred$fourth  <- "d"
pred$fifth   <- "e"

for(i in 1:nrow(pred)){
  pred$first[i]   <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
  pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0

  pred$second[i]  <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
  pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0

  pred$third[i]   <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
  pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0

  pred$fourth[i]   <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
  pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0

  pred$fifth[i]   <- names(pred[which(pred[i,] == max(pred[i,2:13]))])
  pred[i,names(pred[which(pred[i,] == max(pred[i,2:13]))])] <- 0
}

data_long <- melt(pred, id.vars=("id"), measure.vars = c("first", "second", "third", "fourth", "fifth"), value.name = "country")
data_long <- data_long[order(data_long$id),]
data_long$variable <- NULL


What it's doing is picking the top 5 results for each individual, based on the column value of 12 columns in the original prediction dataset.

In the imported csv file each row is a unique individual and each possible prediction class in in a column indexed from 2 to 13.

Here's an example dataset in case you wish to reproduce it (you can use line numbers or an arbitrary sequence in place of test_users$user_id in the code above):

http://wikisend.com/download/535006/2023bd19b880.csv

Solution

This is a solution using data.table and without loops.

require(data.table)

pred <- fread("/home/djhurio/Downloads/2023bd19b880.csv")

pred[, id := 1:.N]

# Melt to long format
data_long <- melt(pred, id.vars = "id", measure.vars = names(pred)[2:13],
                  variable.name = "country")

# Order data by id (A) and value (D)
setorderv(data_long, c("id", "value"), order = c(1, -1))

# Numerate records for each id
data_long[, i := 1:.N, by = id]

# Select top 5
data_long <- data_long[i < 6, .(id, country)]

data_long

Code Snippets

require(data.table)

pred <- fread("/home/djhurio/Downloads/2023bd19b880.csv")

pred[, id := 1:.N]

# Melt to long format
data_long <- melt(pred, id.vars = "id", measure.vars = names(pred)[2:13],
                  variable.name = "country")

# Order data by id (A) and value (D)
setorderv(data_long, c("id", "value"), order = c(1, -1))

# Numerate records for each id
data_long[, i := 1:.N, by = id]

# Select top 5
data_long <- data_long[i < 6, .(id, country)]

data_long

Context

StackExchange Code Review Q#116853, answer score: 3

Revisions (0)

No revisions yet.