patternpythonMinor
Calculate regression coefficients for specific conditions in a data frame
Viewed 0 times
conditionscoefficientsregressionforcalculatespecificframedata
Problem
I have a data set where leaf mass (MASS) was measured in 30 different tanks (TANK) on 3 dates (DATE). The tanks were also assigned to 5 different treatments.
I want to calculate the regression slope of the mass loss for each tank so I wrote the following function in R
This function is currently working (but see below) but I feel like I made some non-optimal choices. Please provide feedback on how to improve this function.
NOTE: The code is working but it does return the list in reverse order 30 - 1 and it changes the treatment labels from letters ("A", "B", "C", etc...) to numbers ("1", "2", "3", etc...). Any thoughts on these idiosyncrasies would also be appreciated.
I want to calculate the regression slope of the mass loss for each tank so I wrote the following function in R
k.tank <- function() {
k <- numeric(0)
tank <- numeric(0)
treat <- character(0)
for(i in TANK)
k <- c(coef(summary(lm(log(MASS)[TANK == i] ~ DATE[TANK == i])))[2, 1], k)
for(i in TANK)
tank <- c(i, tank)
for(i in TANK)
treat <- c(TREATMENT, treat)
k.list <- data.frame(tank, treat, k)
return(k.list)
}This function is currently working (but see below) but I feel like I made some non-optimal choices. Please provide feedback on how to improve this function.
NOTE: The code is working but it does return the list in reverse order 30 - 1 and it changes the treatment labels from letters ("A", "B", "C", etc...) to numbers ("1", "2", "3", etc...). Any thoughts on these idiosyncrasies would also be appreciated.
Solution
A bunch of things to say.
For your questions:
it does return the list in reverse order 30 - 1
You're making the vector by prepending the value before the actual state:
Using
- You're looping 3 times on the same vector "TANK", once would have be enough.
- You're growing vectors in each loop
- One loop just recreate the same vector reversed (more on this later)
- One loop repeat a vector as much as there's entries in TANK (which is not what you're after I'm pretty sure).
- You're creating a local data.frame just to return it.
For your questions:
it does return the list in reverse order 30 - 1
You're making the vector by prepending the value before the actual state:
for(i in TANK)
tank <- c(i, tank)Using
tank
- Matching by the TANK name position
You code can be optimized to:
k.tank <- function() {
# allocate the vector once before the loop
k <- numeric(30)
treat <- character(30)
tanks=unique(TANK)
for(i in seq_along(tanks)) { # loop on indice instead of value
# Assign the result in the proper place directly
treat[i] <- TREATMENT[TANK == i][1]
k[i] <- coef(summary(lm(log(MASS)[TANK == tanks[i]] ~ DATE[TANK == tanks[i]])))[2, 1]
}
# create the data.frame, as last statement this will be the function return value
data.frame(tanks, treat, k)
}
With sample values:
set.seed(42)
MASS=runif(90)*100
TANK=rep(1:30,3)
TREATMENT=rep(rep(LETTERS[1:5],6),3)
DATE=sort(rep(seq.Date(Sys.Date(),by=1,length.out=3), 30))
This gives:
tanks treat k
1 1 A -0.15155006
2 2 B 0.02382969
3 3 C 0.48811951
4 4 D -0.19125411
5 5 E 0.14033970
6 6 A -0.50391863
7 7 B -0.49942663
8 8 C 0.90820123
9 9 D 0.02682661
10 10 E -0.53769180
11 11 A -1.18268285
12 12 B -0.81647939
13 13 C -0.73156739
14 14 D 0.31479427
15 15 E -0.42545699
16 16 A -0.13376959
17 17 B -2.41040604
18 18 C 0.58095049
19 19 D 0.03985375
20 20 E -2.93855112
21 21 A -0.22053714
22 22 B 0.06480414
23 23 C -0.50659181
24 24 D -0.19135960
25 25 E 1.12094187
26 26 A 0.04589634
27 27 B -0.25630776
28 28 C -1.15457853
29 29 D -0.82633221
30 30 E -0.50380311
Now you read until there, and still assuming your variables are of same length (but if not I can't really get your code) you may try doing something more direct:
# Create a data.frame with your observations
df <- data.frame(TANK,MASS,TREATMENT,DATE, stringsAsFactors=FALSE)
# Create a function for ease of use, taking a data.frame as input
tank.coef <- function(int.df) {
coef(summary(lm(log(int.df$MASS) ~ int.df$DATE)))[2,1]
}
# run the above function on global data.frame, subsetting by TANK value
k <- by(df,df$TANK,tank.coef)
This will return a by object (coef by tank):
> head(k)
df$TANK
1 2 3 4 5 6
-0.15155006 0.02382969 0.48811951 -0.19125411 0.14033970 -0.50391863
You can get rid of the name with as.vector` and recreate a data.frame with:> rdf head(rdf)
tank treat k
1 1 A -0.15155006
2 2 B 0.02382969
3 3 C 0.48811951
4 4 D -0.19125411
5 5 E 0.14033970
6 6 A -0.50391863Code Snippets
for(i in TANK)
tank <- c(i, tank)k.tank <- function() {
# allocate the vector once before the loop
k <- numeric(30)
treat <- character(30)
tanks=unique(TANK)
for(i in seq_along(tanks)) { # loop on indice instead of value
# Assign the result in the proper place directly
treat[i] <- TREATMENT[TANK == i][1]
k[i] <- coef(summary(lm(log(MASS)[TANK == tanks[i]] ~ DATE[TANK == tanks[i]])))[2, 1]
}
# create the data.frame, as last statement this will be the function return value
data.frame(tanks, treat, k)
}set.seed(42)
MASS=runif(90)*100
TANK=rep(1:30,3)
TREATMENT=rep(rep(LETTERS[1:5],6),3)
DATE=sort(rep(seq.Date(Sys.Date(),by=1,length.out=3), 30))tanks treat k
1 1 A -0.15155006
2 2 B 0.02382969
3 3 C 0.48811951
4 4 D -0.19125411
5 5 E 0.14033970
6 6 A -0.50391863
7 7 B -0.49942663
8 8 C 0.90820123
9 9 D 0.02682661
10 10 E -0.53769180
11 11 A -1.18268285
12 12 B -0.81647939
13 13 C -0.73156739
14 14 D 0.31479427
15 15 E -0.42545699
16 16 A -0.13376959
17 17 B -2.41040604
18 18 C 0.58095049
19 19 D 0.03985375
20 20 E -2.93855112
21 21 A -0.22053714
22 22 B 0.06480414
23 23 C -0.50659181
24 24 D -0.19135960
25 25 E 1.12094187
26 26 A 0.04589634
27 27 B -0.25630776
28 28 C -1.15457853
29 29 D -0.82633221
30 30 E -0.50380311# Create a data.frame with your observations
df <- data.frame(TANK,MASS,TREATMENT,DATE, stringsAsFactors=FALSE)
# Create a function for ease of use, taking a data.frame as input
tank.coef <- function(int.df) {
coef(summary(lm(log(int.df$MASS) ~ int.df$DATE)))[2,1]
}
# run the above function on global data.frame, subsetting by TANK value
k <- by(df,df$TANK,tank.coef)Context
StackExchange Code Review Q#148671, answer score: 3
Revisions (0)
No revisions yet.