patternpythonMinor

Calculate regression coefficients for specific conditions in a data frame

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

conditionscoefficientsregressionforcalculatespecificframedata

Problem

I have a data set where leaf mass (MASS) was measured in 30 different tanks (TANK) on 3 dates (DATE). The tanks were also assigned to 5 different treatments.

I want to calculate the regression slope of the mass loss for each tank so I wrote the following function in R

k.tank <- function() {
   k <- numeric(0)
   tank <- numeric(0)
   treat <- character(0)

   for(i in TANK)
     k <- c(coef(summary(lm(log(MASS)[TANK == i] ~ DATE[TANK == i])))[2, 1], k)

   for(i in TANK)
     tank <- c(i, tank)

   for(i in TANK)
     treat <- c(TREATMENT, treat)

   k.list <- data.frame(tank, treat, k)

   return(k.list)
}

This function is currently working (but see below) but I feel like I made some non-optimal choices. Please provide feedback on how to improve this function.

NOTE: The code is working but it does return the list in reverse order 30 - 1 and it changes the treatment labels from letters ("A", "B", "C", etc...) to numbers ("1", "2", "3", etc...). Any thoughts on these idiosyncrasies would also be appreciated.

Solution

A bunch of things to say.

You're looping 3 times on the same vector "TANK", once would have be enough.

You're growing vectors in each loop

One loop just recreate the same vector reversed (more on this later)

One loop repeat a vector as much as there's entries in TANK (which is not what you're after I'm pretty sure).

You're creating a local data.frame just to return it.

For your questions:

it does return the list in reverse order 30 - 1

You're making the vector by prepending the value before the actual state:

for(i in TANK)
     tank <- c(i, tank)

Using

tank 

Matching by the TANK name position



You code can be optimized to:

k.tank <- function() {
  # allocate the vector once before the loop
  k <- numeric(30) 
  treat <- character(30)
  tanks=unique(TANK)

  for(i in seq_along(tanks)) { # loop on indice instead of value
    # Assign the result in the proper place directly
    treat[i] <- TREATMENT[TANK == i][1]
    k[i] <- coef(summary(lm(log(MASS)[TANK == tanks[i]] ~ DATE[TANK == tanks[i]])))[2, 1]
  }

  # create the data.frame, as last statement this will be the function return value
  data.frame(tanks, treat, k)
}


With sample values:

set.seed(42)
MASS=runif(90)*100
TANK=rep(1:30,3)
TREATMENT=rep(rep(LETTERS[1:5],6),3)
DATE=sort(rep(seq.Date(Sys.Date(),by=1,length.out=3), 30))


This gives:

tanks treat           k
1      1     A -0.15155006
2      2     B  0.02382969
3      3     C  0.48811951
4      4     D -0.19125411
5      5     E  0.14033970
6      6     A -0.50391863
7      7     B -0.49942663
8      8     C  0.90820123
9      9     D  0.02682661
10    10     E -0.53769180
11    11     A -1.18268285
12    12     B -0.81647939
13    13     C -0.73156739
14    14     D  0.31479427
15    15     E -0.42545699
16    16     A -0.13376959
17    17     B -2.41040604
18    18     C  0.58095049
19    19     D  0.03985375
20    20     E -2.93855112
21    21     A -0.22053714
22    22     B  0.06480414
23    23     C -0.50659181
24    24     D -0.19135960
25    25     E  1.12094187
26    26     A  0.04589634
27    27     B -0.25630776
28    28     C -1.15457853
29    29     D -0.82633221
30    30     E -0.50380311


Now you read until there, and still assuming your variables are of same length (but if not I can't really get your code) you may try doing something more direct:

# Create a data.frame with your observations
df <- data.frame(TANK,MASS,TREATMENT,DATE, stringsAsFactors=FALSE)
# Create a function for ease of use, taking a data.frame as input
tank.coef <- function(int.df) {
  coef(summary(lm(log(int.df$MASS) ~ int.df$DATE)))[2,1]
}

# run the above function on global data.frame, subsetting by TANK value
k <- by(df,df$TANK,tank.coef)


This will return a

 object (coef by tank):

> head(k)
df$TANK
          1           2           3           4           5           6 
-0.15155006  0.02382969  0.48811951 -0.19125411  0.14033970 -0.50391863


You can get rid of the name with

as.vector` and recreate a data.frame with:

> rdf  head(rdf)
  tank treat           k
1    1     A -0.15155006
2    2     B  0.02382969
3    3     C  0.48811951
4    4     D -0.19125411
5    5     E  0.14033970
6    6     A -0.50391863

Code Snippets

for(i in TANK)
     tank <- c(i, tank)

k.tank <- function() {
  # allocate the vector once before the loop
  k <- numeric(30) 
  treat <- character(30)
  tanks=unique(TANK)

  for(i in seq_along(tanks)) { # loop on indice instead of value
    # Assign the result in the proper place directly
    treat[i] <- TREATMENT[TANK == i][1]
    k[i] <- coef(summary(lm(log(MASS)[TANK == tanks[i]] ~ DATE[TANK == tanks[i]])))[2, 1]
  }

  # create the data.frame, as last statement this will be the function return value
  data.frame(tanks, treat, k)
}

set.seed(42)
MASS=runif(90)*100
TANK=rep(1:30,3)
TREATMENT=rep(rep(LETTERS[1:5],6),3)
DATE=sort(rep(seq.Date(Sys.Date(),by=1,length.out=3), 30))

tanks treat           k
1      1     A -0.15155006
2      2     B  0.02382969
3      3     C  0.48811951
4      4     D -0.19125411
5      5     E  0.14033970
6      6     A -0.50391863
7      7     B -0.49942663
8      8     C  0.90820123
9      9     D  0.02682661
10    10     E -0.53769180
11    11     A -1.18268285
12    12     B -0.81647939
13    13     C -0.73156739
14    14     D  0.31479427
15    15     E -0.42545699
16    16     A -0.13376959
17    17     B -2.41040604
18    18     C  0.58095049
19    19     D  0.03985375
20    20     E -2.93855112
21    21     A -0.22053714
22    22     B  0.06480414
23    23     C -0.50659181
24    24     D -0.19135960
25    25     E  1.12094187
26    26     A  0.04589634
27    27     B -0.25630776
28    28     C -1.15457853
29    29     D -0.82633221
30    30     E -0.50380311

# Create a data.frame with your observations
df <- data.frame(TANK,MASS,TREATMENT,DATE, stringsAsFactors=FALSE)
# Create a function for ease of use, taking a data.frame as input
tank.coef <- function(int.df) {
  coef(summary(lm(log(int.df$MASS) ~ int.df$DATE)))[2,1]
}

# run the above function on global data.frame, subsetting by TANK value
k <- by(df,df$TANK,tank.coef)

Context

StackExchange Code Review Q#148671, answer score: 3

Revisions (0)

No revisions yet.