patternpythonMinor
Achieve a dataframe from a dynamic list of lists
Viewed 0 times
achievelistsdataframedynamiclistfrom
Problem
This code works for me, but I am trying to improve my coding and am looking for advice on how to improve it. Specifically I want to simplify this so that perhaps I can have a dynamically updating dataframe that tracks the "site" that I am on, and the "cutoff" value that I am on, and records the calculations for that site/cutoff value combination. Right now, it's a little clumsy and admittedly very ugly. Through this process, though, I learned about "assign" for creating assigning variables on the fly. I think I just need maybe one or two more concepts before I can really clean this up.
Any help is greatly appreciated!
calcs <- function(site, cutoff){} #Returns a 4 item list
cutoffmatrix <- c(0, 0.005, 0.01, 0.015) #matrix that I want to loop through
for(i in 1:length(cutoffmatrix)){ #Loop that uses my function for cutoffmatrix value
test <- list()
co <- cutoffmatrix[i]
assign(paste("TC", i, sep = ""), calcs("TC", co))
}
l_TC <- list(TC1, TC2, TC3, TC4) #combines the loop results into a list for that site
library(plyr)
df_TC <- ldply (l_TC, data.frame) #transform lists of lists into dataframe
for(i in 1:length(cutoffmatrix)){ #same thing, but for next site
test <- list()
co <- cutoffmatrix[i]
assign(paste("DC", i, sep = ""), calcs("DC", co))
}
l_DC <- list(DC1, DC2, DC3, DC4)
df_DC <- ldply(l_DC, data.frame)
final <- rbind (df_TC, df_DC) #combines both site's results into a dataframe
final <- cbind(final, cutoffmatrix) #adds cutoff values to dataframe
namelist <- c("TC", "TC", "TC", "TC", "DC", "DC", "DC", "DC") #adds sites to dataframe
final <- cbind(final, namelist)Any help is greatly appreciated!
Solution
Reviewing line-by-line:
If this is a vector, why call it a matrix? I would just call it just
The use of
The use of
What would happen if instead of 4 objects, you had 2000 objects? Do you see yourself typing
But wait. This is still abominable. For one reason, you could be creating thousands of such objects and "polluting" your environment with similar objects. Why not put them all into a single object? Also imagine someone coming across
So what is the alternative? Do not create
Look how shorter this is! And in the following, 1) your code will only use
Next:
So this converts the lists into data.frames and rbinds everything together. You do not really need
But before doing that: Why do you need to convert the list items to data.frames, then bind? Could you instead rewrite
Instead of having to add
This code has a baked-in assumption that there are four cutoffs. If you were to modify
Still, not great. Similar to above, the best would be if
Last. What if instead of two sites
cutoffmatrix <- c(0, 0.005, 0.01, 0.015)If this is a vector, why call it a matrix? I would just call it just
cutoffs.for(i in 1:length(cutoffmatrix)){
test <- list()
co <- cutoffmatrix[i]
assign(paste("TC", i, sep = ""), calcs("TC", co))
}The use of
1:length(cutoffmatrix) in a for loop is a common mistake. You'll probably agree that, by design, the code in the loop will be executed length(cutoffmatrix). Well, that's not exactly always the case. In particular, when cutoffmatrix is empty: cutoffmatrix <- c(). You would want the loop to be executed length(cutoffmatrix) i.e. zero times. However, in this unfortunate case, 1:length(cutoffmatrix) would evaluate to 1:0 i.e. c(1, 0). Instead of zero iteration, you'll have two and likely cause an error. Instead of 1:length(cutoffmatrix), you should use seq_along(cutoffmatrix) or seq_len(length(cutoffmatrix)). Both seq_along and seq_len will do the right thing when their input is empty or zero respectively.The use of
assign to dynamically create objects is, excuse my word, an abomination. I'll try to show you a couple reasons why. Via your loop, you have managed to create four objects: TC1, TC2, TC3, TC4 which you later use in your code:l_TC <- list(TC1, TC2, TC3, TC4)What would happen if instead of 4 objects, you had 2000 objects? Do you see yourself typing
l_TC <- list(TC1, TC2, TC3, TC4, ..., TC2000)? No. Maybe you will find about the get or mget function so you can do:l_TC <- mget(paste("TC", seq_along(cutoffmatrix), sep = ""))But wait. This is still abominable. For one reason, you could be creating thousands of such objects and "polluting" your environment with similar objects. Why not put them all into a single object? Also imagine someone coming across
TC1 in your environment and searching for "TC1" through your code to find where it got created: no luck. Finally, having to use mget to indirectly refer to the TC* objects is a bit complex, no? And you have to remember that the objects are labelled 1 through length(cutoffmatrix)...So what is the alternative? Do not create
length(cutoffmatrix) objects but just one object: a list containing length(cutoffmatrix) items. How? Use lapply or Map:l_TC <- lapply(cutoffvec, calcs, site = "TC")Look how shorter this is! And in the following, 1) your code will only use
l_TC to refer to those 4 or 2000 objects stored within 2) Someone stumbling upon l_TC in your environment will see exactly where it was created in your code. By all means, please never use assign again and erase that function from your memory!Next:
library(plyr)
df_TC <- ldply (l_TC, data.frame) #transform lists of lists into dataframeSo this converts the lists into data.frames and rbinds everything together. You do not really need
plyr for that. You can first turn the lists into data.frames by doing lapply(l_TC, as.data.frame). And you can do the binding using do.call(rbind, ...), so:df_TC <- do.call(rbind, lapply(l_TC, as.data.frame))But before doing that: Why do you need to convert the list items to data.frames, then bind? Could you instead rewrite
calcs so that it returns a data.frame? And better, could calc take a vector of cutoffs instead of a single value and return a data.frame with length(cutoffs) rows?final <- rbind(df_TC, df_DC)
final <- cbind(final, cutoffmatrix)Instead of having to add
cutoffmatrix, could calcs already include a cutoff item in its output list?namelist <- c("TC", "TC", "TC", "TC", "DC", "DC", "DC", "DC")
final <- cbind(final, namelist)This code has a baked-in assumption that there are four cutoffs. If you were to modify
cutoffmatrix so its length is not four, then this would break. Instead, you could do:namelist <- rep(c("TC", "DC"), each = length(cutoffmatrix))Still, not great. Similar to above, the best would be if
calcs could include a site = site item in its output list.Last. What if instead of two sites
"TC" and "DC" you had many more websites to use? How could you modify your code so you do not have to write "TC" or "DC" more than once and have to rely on creating too many objects? Maybe use some of the concepts I have shown you? Good luck!Code Snippets
cutoffmatrix <- c(0, 0.005, 0.01, 0.015)for(i in 1:length(cutoffmatrix)){
test <- list()
co <- cutoffmatrix[i]
assign(paste("TC", i, sep = ""), calcs("TC", co))
}l_TC <- list(TC1, TC2, TC3, TC4)l_TC <- mget(paste("TC", seq_along(cutoffmatrix), sep = ""))l_TC <- lapply(cutoffvec, calcs, site = "TC")Context
StackExchange Code Review Q#121484, answer score: 4
Revisions (0)
No revisions yet.