patternpythonMinor
Mean of many subsets of a dataframe
Viewed 0 times
dataframemeansubsetsmany
Problem
I have large dataframe containing many replicates.
The replicates are in groups of 3. So the first 3 replicates are in column 1, 2 and 3. The second set 4, 5 and 6... and so on.
Now I create a new dataframe containing for each set of replicates the mean.
The code below works, but it is really clumpy and especially the
I really want to improve my R coding style so i am happy about every tip!
The replicates are in groups of 3. So the first 3 replicates are in column 1, 2 and 3. The second set 4, 5 and 6... and so on.
Now I create a new dataframe containing for each set of replicates the mean.
The code below works, but it is really clumpy and especially the
cbind and the collumname setting is really ugly. # first i create the new dataframe
data.mean<- data.frame(matrix(nrows=30))
# iterate over every third collumn
for(col in seq(1,length(colnames(data)), by=3)){
# create a subset from the dataframe and compute the mean of the rows and finally cbind it to the result dataframe
data.mean <-cbind(data.mean,apply(subset(data, select=seq(col,length.out = 3)),1,mean, na.rm = TRUE))
# setting the new collumn name to the colname from the old dataset (name of the first replicate)
colnames(data.mean)[ncol(data.mean)] <- colnames(data)[col]
}I really want to improve my R coding style so i am happy about every tip!
Solution
Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
Now, we set the number of columns per group:
Based on this information, some necessary information can be calculated:
In the next step,
The command above returns a list. It can be transformed into a data frame:
In order to set the column names of the new data frame, we first have to extract the column names of the groups' first columns.
Now, these names are used for the new data frame:
Done.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3Based on this information, some necessary information can be calculated:
# number of groups
n_grp <- ncol(dat) / n
# 2
# column indices (one vector per group)
idx_grp <- split(seq(dat), rep(seq(n_grp), each = n))
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
1`
# [1] 2 3 4
#
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
2`
# [1] 5 6 7In the next step,
lapply is used to calculate the row means of each group. This is much more convenient with the rowMeans function.# calculate the row means for all groups
res <- lapply(idx_grp, function(i) {
# subset of the data frame
tmp <- dat[i]
# calculate row means
rowMeans(tmp, na.rm = TRUE)
})
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
# number of groups
n_grp <- ncol(dat) / n
# 2
# column indices (one vector per group)
idx_grp <- split(seq(dat), rep(seq(n_grp), each = n))
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
1`
# [1] 2 3 4
#
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
2`
# [1] 5 6 7
In the next step, lapply is used to calculate the row means of each group. This is much more convenient with the rowMeans function.
1`
# [1] 4.666667 5.666667 6.666667
#
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
# number of groups
n_grp <- ncol(dat) / n
# 2
# column indices (one vector per group)
idx_grp <- split(seq(dat), rep(seq(n_grp), each = n))
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
1`
# [1] 2 3 4
#
# Here is a proposal for a different approach that doesn't use a for loop and has some simplifications.
First, an example data frame:
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8
Now, we set the number of columns per group:
# number of columns per group (1-3, 4-6)
n <- 3
Based on this information, some necessary information can be calculated:
2`
# [1] 5 6 7
In the next step, lapply is used to calculate the row means of each group. This is much more convenient with the rowMeans function.
2`
# [1] 3.333333 4.333333 5.333333The command above returns a list. It can be transformed into a data frame:
# transform list into a data frame
dat2 <- as.data.frame(res)
# X1 X2
# 1 4.666667 3.333333
# 2 5.666667 4.333333
# 3 6.666667 5.333333In order to set the column names of the new data frame, we first have to extract the column names of the groups' first columns.
# extract names of first column of each group
names_frst <- names(dat)[sapply(idx_grp, "[", 1)]
# [1] "a1" "b1"Now, these names are used for the new data frame:
# modify column names of new data frame
names(dat2) <- names_frst
# a1 b1
# 1 4.666667 3.333333
# 2 5.666667 4.333333
# 3 6.666667 5.333333Done.
Code Snippets
dat <- data.frame(a1 = 9:11, a2 = 2:4, a3 = 3:5,
b1 = 4:6, b2 = 5:7, b3 = 1:3)
# a1 a2 a3 b1 b2 b3
# 1 1 2 3 4 5 6
# 2 2 3 4 5 6 7
# 3 3 4 5 6 7 8# number of columns per group (1-3, 4-6)
n <- 3# number of groups
n_grp <- ncol(dat) / n
# 2
# column indices (one vector per group)
idx_grp <- split(seq(dat), rep(seq(n_grp), each = n))
# $`1`
# [1] 2 3 4
#
# $`2`
# [1] 5 6 7# calculate the row means for all groups
res <- lapply(idx_grp, function(i) {
# subset of the data frame
tmp <- dat[i]
# calculate row means
rowMeans(tmp, na.rm = TRUE)
})
# $`1`
# [1] 4.666667 5.666667 6.666667
#
# $`2`
# [1] 3.333333 4.333333 5.333333# transform list into a data frame
dat2 <- as.data.frame(res)
# X1 X2
# 1 4.666667 3.333333
# 2 5.666667 4.333333
# 3 6.666667 5.333333Context
StackExchange Code Review Q#58523, answer score: 7
Revisions (0)
No revisions yet.