HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

best way to apply across an xts object

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
bestapplywayacrossobjectxts

Problem

As xts objects are arrays, getting apply functions to work is a little tricky if you want to preserve the dates. For example, take the xts object xx below:

xx <- xts(replicate(6, sample(c(1:10), 10, rep = T)), 
          order.by = Sys.Date() + 1:10)


Say we wish to apply a function to each column (to keep it simple say i wish to add 100 to each element of each column).

sapply(xx, function(col) col + 100)


Doing with with sapply loses the row names.

Using apply works -- but it is slow (see below).

apply(xx, 2, function(col) col + 100)


To address this, I wrote the following helper:

mapXts <- function(Xts, cFUN) {
        if(!is.xts(Xts)) stop("Must supply function with xts object")
        Z <- Xts
        for (j in 1:ncol(Xts)) {
            Z[,j] <- do.call(cFUN, list(Xts[,j]))
        }
        Z
}


The obligatory horse race:

set.seed(1)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = T)),
          order.by = Sys.Date() + 1:1000)

apFun <- function() apply(xz, 2, function(col) col + 100)
sapFun <- function() sapply(xz, function(col) col + 100)
mapXf <- function() mapXts(xz, function(col) col + 100)

op <- microbenchmark(
               app = apFun(),
               sap = sapFun(), 
               mXf = mapXf(),
               times = 1000L)


This results in speedups of about 5x over apply and is about 3x faster than sapply (and it gives me back the data in the formate i most value).

Unit: microseconds
 expr      min        lq    median        uq      max neval
  app 3083.425 3206.0295 3287.1725 3496.9720 30678.90  1000
  sap 1669.384 1763.7945 1827.0265 1952.7490 24173.88  1000
  mXf  615.279  677.0005  709.4725  787.1635 24390.62  1000


Are there any further tricks i might use to speed things up? Also, any dark corners in the case that i've missed?

Solution

You can use the function vapply. The help page of ?vapply says:


vapply is similar to sapply, but has a pre-specified type of return
value, so it can be safer (and sometimes faster) to use.

You can use

vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))


The argument FUN.VALUE is used to tell the vapply what the function returns. Since all columns are numeric, the function returns a numeric vector of length nrow(xz) for each column of xz. Unfortunately, vapply does not preserve the dates of the xts object.

You can use the following command to generate a new object based on xz and replace all values with the matrix returned by vapply. This is very easy with the following command:

"[<-"(xz, , vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))))


The function "[<-" copies the xz object and replaces all its values.

Alternatively, you can create a new xts object based on the matrix returned by vapply and the time information in your original xts object.

xts(vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))),
    order.by = time(xz))


Now, we test the approaches:

vapFun <- function() "[<-"(xz, , vapply(xz, function(col) col + 100, 
                                        FUN.VALUE = numeric(nrow(xz))))

vapFun2 <- function() xts(vapply(xz, function(col) col + 100, 
                                 FUN.VALUE = numeric(nrow(xz))),
                          order.by = time(xz))

library(microbenchmark)

op <- microbenchmark(
  app = apFun(),
  sap = sapFun(), 
  mXf = mapXf(),
  vap = vapFun(),
  vap2 = vapFun2(),
  times = 1000L)

Unit: microseconds
 expr      min       lq   median       uq        max neval
  app 4066.612 4117.860 4167.324 4270.200 364484.879  1000
  sap 3337.001 3428.798 3463.680 3532.317   4310.336  1000
  mXf 1232.070 1263.531 1275.663 1295.014   2320.361  1000
  vap  984.732 1013.248 1022.726 1035.330 100172.130  1000
 vap2 1326.899 1368.562 1389.672 1428.608 126477.354  1000


As you can see, the approach with vapply and "[<-" is the fastest one.

An important information: if the function you want to apply to each column is a mathematical operation, you can apply it to the whole xts object at once, e.g., xz + 100. This is considerably faster and returns an xts object, i.e., the dates are preserved.

simpleFun <- function() xz + 100

op <- microbenchmark(
  app = apFun(),
  sap = sapFun(), 
  mXf = mapXf(),
  vap = vapFun(),
  vap2 = vapFun2(),
  simple = simpleFun(),
  times = 1000L)

Unit: microseconds
   expr      min        lq   median        uq      max neval
    app 4063.972 4119.6995 4145.120 4243.7955 85454.87  1000
    sap 3347.558 3429.6725 3450.778 3515.3745 86508.52  1000
    mXf 1236.142 1267.2030 1276.455 1289.7095 84349.75  1000
    vap  992.792 1017.7425 1024.992 1033.7785 82275.26  1000
   vap2 1336.300 1372.8755 1390.553 1423.9875 83118.79  1000
 simple   92.770   98.4305  101.080  103.9995 81233.70  1000

Code Snippets

vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))
"[<-"(xz, , vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))))
xts(vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))),
    order.by = time(xz))
vapFun <- function() "[<-"(xz, , vapply(xz, function(col) col + 100, 
                                        FUN.VALUE = numeric(nrow(xz))))

vapFun2 <- function() xts(vapply(xz, function(col) col + 100, 
                                 FUN.VALUE = numeric(nrow(xz))),
                          order.by = time(xz))


library(microbenchmark)

op <- microbenchmark(
  app = apFun(),
  sap = sapFun(), 
  mXf = mapXf(),
  vap = vapFun(),
  vap2 = vapFun2(),
  times = 1000L)

Unit: microseconds
 expr      min       lq   median       uq        max neval
  app 4066.612 4117.860 4167.324 4270.200 364484.879  1000
  sap 3337.001 3428.798 3463.680 3532.317   4310.336  1000
  mXf 1232.070 1263.531 1275.663 1295.014   2320.361  1000
  vap  984.732 1013.248 1022.726 1035.330 100172.130  1000
 vap2 1326.899 1368.562 1389.672 1428.608 126477.354  1000
simpleFun <- function() xz + 100


op <- microbenchmark(
  app = apFun(),
  sap = sapFun(), 
  mXf = mapXf(),
  vap = vapFun(),
  vap2 = vapFun2(),
  simple = simpleFun(),
  times = 1000L)

Unit: microseconds
   expr      min        lq   median        uq      max neval
    app 4063.972 4119.6995 4145.120 4243.7955 85454.87  1000
    sap 3347.558 3429.6725 3450.778 3515.3745 86508.52  1000
    mXf 1236.142 1267.2030 1276.455 1289.7095 84349.75  1000
    vap  992.792 1017.7425 1024.992 1033.7785 82275.26  1000
   vap2 1336.300 1372.8755 1390.553 1423.9875 83118.79  1000
 simple   92.770   98.4305  101.080  103.9995 81233.70  1000

Context

StackExchange Code Review Q#39180, answer score: 12

Revisions (0)

No revisions yet.