patternpythonModerate
best way to apply across an xts object
Viewed 0 times
bestapplywayacrossobjectxts
Problem
As
Say we wish to apply a function to each column (to keep it simple say i wish to add 100 to each element of each column).
Doing with with
Using
To address this, I wrote the following helper:
The obligatory horse race:
This results in speedups of about 5x over
Are there any further tricks i might use to speed things up? Also, any dark corners in the case that i've missed?
xts objects are arrays, getting apply functions to work is a little tricky if you want to preserve the dates. For example, take the xts object xx below: xx <- xts(replicate(6, sample(c(1:10), 10, rep = T)),
order.by = Sys.Date() + 1:10)Say we wish to apply a function to each column (to keep it simple say i wish to add 100 to each element of each column).
sapply(xx, function(col) col + 100)Doing with with
sapply loses the row names. Using
apply works -- but it is slow (see below). apply(xx, 2, function(col) col + 100)To address this, I wrote the following helper:
mapXts <- function(Xts, cFUN) {
if(!is.xts(Xts)) stop("Must supply function with xts object")
Z <- Xts
for (j in 1:ncol(Xts)) {
Z[,j] <- do.call(cFUN, list(Xts[,j]))
}
Z
}The obligatory horse race:
set.seed(1)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = T)),
order.by = Sys.Date() + 1:1000)
apFun <- function() apply(xz, 2, function(col) col + 100)
sapFun <- function() sapply(xz, function(col) col + 100)
mapXf <- function() mapXts(xz, function(col) col + 100)
op <- microbenchmark(
app = apFun(),
sap = sapFun(),
mXf = mapXf(),
times = 1000L)This results in speedups of about 5x over
apply and is about 3x faster than sapply (and it gives me back the data in the formate i most value). Unit: microseconds
expr min lq median uq max neval
app 3083.425 3206.0295 3287.1725 3496.9720 30678.90 1000
sap 1669.384 1763.7945 1827.0265 1952.7490 24173.88 1000
mXf 615.279 677.0005 709.4725 787.1635 24390.62 1000Are there any further tricks i might use to speed things up? Also, any dark corners in the case that i've missed?
Solution
You can use the function
vapply is similar to sapply, but has a pre-specified type of return
value, so it can be safer (and sometimes faster) to use.
You can use
The argument
You can use the following command to generate a new object based on
The function
Alternatively, you can create a new
Now, we test the approaches:
As you can see, the approach with
An important information: if the function you want to apply to each column is a mathematical operation, you can apply it to the whole
vapply. The help page of ?vapply says:vapply is similar to sapply, but has a pre-specified type of return
value, so it can be safer (and sometimes faster) to use.
You can use
vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))The argument
FUN.VALUE is used to tell the vapply what the function returns. Since all columns are numeric, the function returns a numeric vector of length nrow(xz) for each column of xz. Unfortunately, vapply does not preserve the dates of the xts object.You can use the following command to generate a new object based on
xz and replace all values with the matrix returned by vapply. This is very easy with the following command:"[<-"(xz, , vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))))The function
"[<-" copies the xz object and replaces all its values.Alternatively, you can create a new
xts object based on the matrix returned by vapply and the time information in your original xts object.xts(vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))),
order.by = time(xz))Now, we test the approaches:
vapFun <- function() "[<-"(xz, , vapply(xz, function(col) col + 100,
FUN.VALUE = numeric(nrow(xz))))
vapFun2 <- function() xts(vapply(xz, function(col) col + 100,
FUN.VALUE = numeric(nrow(xz))),
order.by = time(xz))
library(microbenchmark)
op <- microbenchmark(
app = apFun(),
sap = sapFun(),
mXf = mapXf(),
vap = vapFun(),
vap2 = vapFun2(),
times = 1000L)
Unit: microseconds
expr min lq median uq max neval
app 4066.612 4117.860 4167.324 4270.200 364484.879 1000
sap 3337.001 3428.798 3463.680 3532.317 4310.336 1000
mXf 1232.070 1263.531 1275.663 1295.014 2320.361 1000
vap 984.732 1013.248 1022.726 1035.330 100172.130 1000
vap2 1326.899 1368.562 1389.672 1428.608 126477.354 1000As you can see, the approach with
vapply and "[<-" is the fastest one.An important information: if the function you want to apply to each column is a mathematical operation, you can apply it to the whole
xts object at once, e.g., xz + 100. This is considerably faster and returns an xts object, i.e., the dates are preserved.simpleFun <- function() xz + 100
op <- microbenchmark(
app = apFun(),
sap = sapFun(),
mXf = mapXf(),
vap = vapFun(),
vap2 = vapFun2(),
simple = simpleFun(),
times = 1000L)
Unit: microseconds
expr min lq median uq max neval
app 4063.972 4119.6995 4145.120 4243.7955 85454.87 1000
sap 3347.558 3429.6725 3450.778 3515.3745 86508.52 1000
mXf 1236.142 1267.2030 1276.455 1289.7095 84349.75 1000
vap 992.792 1017.7425 1024.992 1033.7785 82275.26 1000
vap2 1336.300 1372.8755 1390.553 1423.9875 83118.79 1000
simple 92.770 98.4305 101.080 103.9995 81233.70 1000Code Snippets
vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))"[<-"(xz, , vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))))xts(vapply(xz, function(col) col + 100, FUN.VALUE = numeric(nrow(xz))),
order.by = time(xz))vapFun <- function() "[<-"(xz, , vapply(xz, function(col) col + 100,
FUN.VALUE = numeric(nrow(xz))))
vapFun2 <- function() xts(vapply(xz, function(col) col + 100,
FUN.VALUE = numeric(nrow(xz))),
order.by = time(xz))
library(microbenchmark)
op <- microbenchmark(
app = apFun(),
sap = sapFun(),
mXf = mapXf(),
vap = vapFun(),
vap2 = vapFun2(),
times = 1000L)
Unit: microseconds
expr min lq median uq max neval
app 4066.612 4117.860 4167.324 4270.200 364484.879 1000
sap 3337.001 3428.798 3463.680 3532.317 4310.336 1000
mXf 1232.070 1263.531 1275.663 1295.014 2320.361 1000
vap 984.732 1013.248 1022.726 1035.330 100172.130 1000
vap2 1326.899 1368.562 1389.672 1428.608 126477.354 1000simpleFun <- function() xz + 100
op <- microbenchmark(
app = apFun(),
sap = sapFun(),
mXf = mapXf(),
vap = vapFun(),
vap2 = vapFun2(),
simple = simpleFun(),
times = 1000L)
Unit: microseconds
expr min lq median uq max neval
app 4063.972 4119.6995 4145.120 4243.7955 85454.87 1000
sap 3347.558 3429.6725 3450.778 3515.3745 86508.52 1000
mXf 1236.142 1267.2030 1276.455 1289.7095 84349.75 1000
vap 992.792 1017.7425 1024.992 1033.7785 82275.26 1000
vap2 1336.300 1372.8755 1390.553 1423.9875 83118.79 1000
simple 92.770 98.4305 101.080 103.9995 81233.70 1000Context
StackExchange Code Review Q#39180, answer score: 12
Revisions (0)
No revisions yet.