HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Returning an existing data frame with four new columns

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
newcolumnswithfourdatareturningexistingframe

Problem

I'm trying to implement a function that given a data frame returns the same data frame with four columns added. These new four columns are: for each row, I get the maximum element and its index and put them as two new columns. I do the same with the second maximum element. I don't care if they are repeated.

add_2max <- function(x)
{
  max1 = max(x, na.rm=TRUE)
  indmax1 = which.max(x)
  y=x[-c(indmax1)]
  max2 = max(y, na.rm=TRUE)
  indmax2 = which(x==max2)
  indmax2 = ifelse(max1==max2, indmax2[2], indmax2[1])
  x=c(x, max1, max2, indmax1, indmax2)
  return (x)
}

add_2max_df <- function(DF)
{
  NewDF=t(apply(DF, 1, add_2max))
  return(NewDF)
}


I'm sure this code can be improved. What do you recommend in order to do that? Is it fast enough?

Solution

Here's a faster way:

add_2maxFaster = imax1) imax2 <- imax2 + 1L
  c(x, x[imax1], x[imax2], imax1, imax2) 
}

set.seed(42)
m <- matrix(runif(1e6), 1e4)

# Compare speed:
system.time( a1<-apply(m, 1, add_2max) )        # 0.38 secs
system.time( a2<-apply(m, 1, add_2maxFaster) )  # 0.15 secs

# ...And compare results
all.equal(a1,a2) # TRUE

Code Snippets

add_2maxFaster <- function(x)
{
  imax1 <- which.max(x)
  imax2 <- which.max(x[-imax1])
  if (imax2 >= imax1) imax2 <- imax2 + 1L
  c(x, x[imax1], x[imax2], imax1, imax2) 
}

set.seed(42)
m <- matrix(runif(1e6), 1e4)

# Compare speed:
system.time( a1<-apply(m, 1, add_2max) )        # 0.38 secs
system.time( a2<-apply(m, 1, add_2maxFaster) )  # 0.15 secs

# ...And compare results
all.equal(a1,a2) # TRUE

Context

StackExchange Code Review Q#10792, answer score: 5

Revisions (0)

No revisions yet.