HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Calculate regression coefficients for specific conditions in a data frame

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
conditionscoefficientsregressionforcalculatespecificframedata

Problem

I have a data set where leaf mass (MASS) was measured in 30 different tanks (TANK) on 3 dates (DATE). The tanks were also assigned to 5 different treatments.

I want to calculate the regression slope of the mass loss for each tank so I wrote the following function in R

k.tank <- function() {
   k <- numeric(0)
   tank <- numeric(0)
   treat <- character(0)

   for(i in TANK)
     k <- c(coef(summary(lm(log(MASS)[TANK == i] ~ DATE[TANK == i])))[2, 1], k)

   for(i in TANK)
     tank <- c(i, tank)

   for(i in TANK)
     treat <- c(TREATMENT, treat)

   k.list <- data.frame(tank, treat, k)

   return(k.list)
}


This function is currently working (but see below) but I feel like I made some non-optimal choices. Please provide feedback on how to improve this function.

NOTE: The code is working but it does return the list in reverse order 30 - 1 and it changes the treatment labels from letters ("A", "B", "C", etc...) to numbers ("1", "2", "3", etc...). Any thoughts on these idiosyncrasies would also be appreciated.

Solution

A bunch of things to say.

  • You're looping 3 times on the same vector "TANK", once would have be enough.



  • You're growing vectors in each loop



  • One loop just recreate the same vector reversed (more on this later)



  • One loop repeat a vector as much as there's entries in TANK (which is not what you're after I'm pretty sure).



  • You're creating a local data.frame just to return it.



For your questions:


it does return the list in reverse order 30 - 1

You're making the vector by prepending the value before the actual state:

for(i in TANK)
     tank <- c(i, tank)


Using tank

  • Matching by the TANK name position



You code can be optimized to:

k.tank <- function() {
  # allocate the vector once before the loop
  k <- numeric(30) 
  treat <- character(30)
  tanks=unique(TANK)

  for(i in seq_along(tanks)) { # loop on indice instead of value
    # Assign the result in the proper place directly
    treat[i] <- TREATMENT[TANK == i][1]
    k[i] <- coef(summary(lm(log(MASS)[TANK == tanks[i]] ~ DATE[TANK == tanks[i]])))[2, 1]
  }

  # create the data.frame, as last statement this will be the function return value
  data.frame(tanks, treat, k)
}


With sample values:

set.seed(42)
MASS=runif(90)*100
TANK=rep(1:30,3)
TREATMENT=rep(rep(LETTERS[1:5],6),3)
DATE=sort(rep(seq.Date(Sys.Date(),by=1,length.out=3), 30))


This gives:

tanks treat           k
1      1     A -0.15155006
2      2     B  0.02382969
3      3     C  0.48811951
4      4     D -0.19125411
5      5     E  0.14033970
6      6     A -0.50391863
7      7     B -0.49942663
8      8     C  0.90820123
9      9     D  0.02682661
10    10     E -0.53769180
11    11     A -1.18268285
12    12     B -0.81647939
13    13     C -0.73156739
14    14     D  0.31479427
15    15     E -0.42545699
16    16     A -0.13376959
17    17     B -2.41040604
18    18     C  0.58095049
19    19     D  0.03985375
20    20     E -2.93855112
21    21     A -0.22053714
22    22     B  0.06480414
23    23     C -0.50659181
24    24     D -0.19135960
25    25     E  1.12094187
26    26     A  0.04589634
27    27     B -0.25630776
28    28     C -1.15457853
29    29     D -0.82633221
30    30     E -0.50380311


Now you read until there, and still assuming your variables are of same length (but if not I can't really get your code) you may try doing something more direct:

# Create a data.frame with your observations
df <- data.frame(TANK,MASS,TREATMENT,DATE, stringsAsFactors=FALSE)
# Create a function for ease of use, taking a data.frame as input
tank.coef <- function(int.df) {
  coef(summary(lm(log(int.df$MASS) ~ int.df$DATE)))[2,1]
}

# run the above function on global data.frame, subsetting by TANK value
k <- by(df,df$TANK,tank.coef)


This will return a
by object (coef by tank):

> head(k)
df$TANK
          1           2           3           4           5           6 
-0.15155006  0.02382969  0.48811951 -0.19125411  0.14033970 -0.50391863


You can get rid of the name with
as.vector` and recreate a data.frame with:

> rdf  head(rdf)
  tank treat           k
1    1     A -0.15155006
2    2     B  0.02382969
3    3     C  0.48811951
4    4     D -0.19125411
5    5     E  0.14033970
6    6     A -0.50391863

Code Snippets

for(i in TANK)
     tank <- c(i, tank)
k.tank <- function() {
  # allocate the vector once before the loop
  k <- numeric(30) 
  treat <- character(30)
  tanks=unique(TANK)

  for(i in seq_along(tanks)) { # loop on indice instead of value
    # Assign the result in the proper place directly
    treat[i] <- TREATMENT[TANK == i][1]
    k[i] <- coef(summary(lm(log(MASS)[TANK == tanks[i]] ~ DATE[TANK == tanks[i]])))[2, 1]
  }

  # create the data.frame, as last statement this will be the function return value
  data.frame(tanks, treat, k)
}
set.seed(42)
MASS=runif(90)*100
TANK=rep(1:30,3)
TREATMENT=rep(rep(LETTERS[1:5],6),3)
DATE=sort(rep(seq.Date(Sys.Date(),by=1,length.out=3), 30))
tanks treat           k
1      1     A -0.15155006
2      2     B  0.02382969
3      3     C  0.48811951
4      4     D -0.19125411
5      5     E  0.14033970
6      6     A -0.50391863
7      7     B -0.49942663
8      8     C  0.90820123
9      9     D  0.02682661
10    10     E -0.53769180
11    11     A -1.18268285
12    12     B -0.81647939
13    13     C -0.73156739
14    14     D  0.31479427
15    15     E -0.42545699
16    16     A -0.13376959
17    17     B -2.41040604
18    18     C  0.58095049
19    19     D  0.03985375
20    20     E -2.93855112
21    21     A -0.22053714
22    22     B  0.06480414
23    23     C -0.50659181
24    24     D -0.19135960
25    25     E  1.12094187
26    26     A  0.04589634
27    27     B -0.25630776
28    28     C -1.15457853
29    29     D -0.82633221
30    30     E -0.50380311
# Create a data.frame with your observations
df <- data.frame(TANK,MASS,TREATMENT,DATE, stringsAsFactors=FALSE)
# Create a function for ease of use, taking a data.frame as input
tank.coef <- function(int.df) {
  coef(summary(lm(log(int.df$MASS) ~ int.df$DATE)))[2,1]
}

# run the above function on global data.frame, subsetting by TANK value
k <- by(df,df$TANK,tank.coef)

Context

StackExchange Code Review Q#148671, answer score: 3

Revisions (0)

No revisions yet.