principlepythonModerate
What is the best approach to use in R and why?
Viewed 0 times
whythewhatanduseapproachbest
Problem
I'm starting working with R, and found some tutorials and exercises online.
I want to divide one variable in two, bigger and equal 79 and smaller 79.
Perhaps because I'm used to python, my first approach was to do something like this:
But I found many tutorials that use this solution instead:
What is the best way to do something like this? And can someone explain why.
Thank you!
I want to divide one variable in two, bigger and equal 79 and smaller 79.
Perhaps because I'm used to python, my first approach was to do something like this:
z <- numeric(length(faithful$waiting))
n = 0
for (i in faithful$waiting) {
n = 1 + n
if (i < 79) z[n] <- 1
}But I found many tutorials that use this solution instead:
min_wait <- min(faithful$waiting)-0.1
max_wait <- max(faithful$waiting)
cutof <- c(min_wait,79,max_wait)
waiting_cat <- cut (faithful$waiting, breaks=cutof)What is the best way to do something like this? And can someone explain why.
Thank you!
Solution
As you realize, your first approach works (it gives a result consistent with the criteria you specify), but it is not idiomatic R. Iterating over elements of a set/list/vector is idiomatic of python, and does have a place in R as well. However, what this approach misses is 2 aspects of R: inherent vectorization and the factor data type.
In R, all basic types are vectors and can hold multiple items (of the same type). A single value is just a special case of a length 1 vector. Since everything is a vector, all the standard functions are designed to operate on the whole vector at once. They are implicitly vectorized over the elements of the vector rather than needing an explicit loop (iteration or for loop) to operate on each element. So the first simplification is to eliminate the for loop over elements of
This brings up another aspect.
The second aspect of R I mentioned was factors. Factors are the implementation of a data type which can take on any of a predefined set of values. In some languages, these are enumerated types. Internally, they are stored as integer indexes into a vector of values (and this sometimes shows through).
You can create a factor from another vector by defining the levels it can take and, optionally, labels these levels should be displayed as. Continuing the example
```
> factor(as.numeric(faithful$waiting < 79), levels=c(1,0), labels=c("<79", "79+"))
[1] 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 79+ <79 <79 79+ <79 <79 79+
[19] <79 79+ <79 <79 <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79
[37] <79 79+ <79 79+ 79+ <79 79+ <79 <79 79+ <79 <79 79+ <79 <79 79+ <79 79+
[55] <79 79+ <79 <79 <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ <79
[73] 79+ <79 <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 79+ <79 79+ <79 79+
[91] <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[109] 79+ 79+ <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+
[127] <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[145] <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ 79+ <79 79+ <79 79+
[163] <79 <79 <79
In R, all basic types are vectors and can hold multiple items (of the same type). A single value is just a special case of a length 1 vector. Since everything is a vector, all the standard functions are designed to operate on the whole vector at once. They are implicitly vectorized over the elements of the vector rather than needing an explicit loop (iteration or for loop) to operate on each element. So the first simplification is to eliminate the for loop over elements of
faithful$waiting and just do the comparison on the whole vector.> faithful$waiting < 79
[1] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE
[13] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[25] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
[37] TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[49] FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
[61] TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
[73] FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
[85] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[97] FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
[109] FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[121] TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[133] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
[145] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
[157] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[169] TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE
[181] TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[193] TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
[205] TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[217] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
[229] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
[241] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
[253] TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[265] TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUEThis brings up another aspect.
faithful$waiting is of length 272, 79 is length 1. Argument recycling causes the 79 to be repeated until it is of the same length as faithful$waiting. Then the comparison is done element-wise, returning a logical variable. If you want it as a numeric (as in your first example), this can be converted directly: FALSE becomes 0 and TRUE becomes 1> as.numeric(faithful$waiting < 79)
[1] 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
[38] 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1
[75] 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1
[112] 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1
[149] 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1
[186] 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 0
[223] 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1
[260] 0 1 0 1 0 1 1 1 0 1 0 1 1The second aspect of R I mentioned was factors. Factors are the implementation of a data type which can take on any of a predefined set of values. In some languages, these are enumerated types. Internally, they are stored as integer indexes into a vector of values (and this sometimes shows through).
You can create a factor from another vector by defining the levels it can take and, optionally, labels these levels should be displayed as. Continuing the example
```
> factor(as.numeric(faithful$waiting < 79), levels=c(1,0), labels=c("<79", "79+"))
[1] 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 79+ <79 <79 79+ <79 <79 79+
[19] <79 79+ <79 <79 <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79
[37] <79 79+ <79 79+ 79+ <79 79+ <79 <79 79+ <79 <79 79+ <79 <79 79+ <79 79+
[55] <79 79+ <79 <79 <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ <79
[73] 79+ <79 <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 79+ <79 79+ <79 79+
[91] <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[109] 79+ 79+ <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+
[127] <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[145] <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ 79+ <79 79+ <79 79+
[163] <79 <79 <79
Code Snippets
> faithful$waiting < 79
[1] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE
[13] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[25] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
[37] TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[49] FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
[61] TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
[73] FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
[85] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[97] FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
[109] FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[121] TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[133] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
[145] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
[157] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[169] TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE
[181] TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[193] TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
[205] TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[217] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
[229] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
[241] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
[253] TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[265] TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE> as.numeric(faithful$waiting < 79)
[1] 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
[38] 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1
[75] 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1
[112] 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1
[149] 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1
[186] 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 0
[223] 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1
[260] 0 1 0 1 0 1 1 1 0 1 0 1 1> factor(as.numeric(faithful$waiting < 79), levels=c(1,0), labels=c("<79", "79+"))
[1] 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 79+ <79 <79 79+ <79 <79 79+
[19] <79 79+ <79 <79 <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79
[37] <79 79+ <79 79+ 79+ <79 79+ <79 <79 79+ <79 <79 79+ <79 <79 79+ <79 79+
[55] <79 79+ <79 <79 <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ <79
[73] 79+ <79 <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 79+ <79 79+ <79 79+
[91] <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[109] 79+ 79+ <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+
[127] <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[145] <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ 79+ <79 79+ <79 79+
[163] <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+ <79 <79 79+ <79
[181] <79 <79 79+ 79+ <79 <79 79+ <79 79+ <79 79+ <79 <79 79+ <79 79+ 79+ <79
[199] <79 <79 <79 79+ 79+ <79 <79 <79 <79 79+ <79 79+ <79 79+ <79 <79 <79 <79
[217] <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 79+ <79
[235] 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 <79 <79 79+
[253] <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 79+ <79 79+
[271] <79 <79
Levels: <79 79+> cut(faithful$waiting, breaks=c(-Inf,79,Inf), right=FALSE)
[1] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[8] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[15] [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[22] [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[29] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79)
[36] [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79)
[43] [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [79, Inf)
[50] [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[57] [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[64] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79)
[71] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79)
[78] [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[85] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[92] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79)
[99] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [79, Inf)
[106] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79) [-Inf,79)
[113] [79, Inf) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[120] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [79, Inf)
[127] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[134] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[141] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[148] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[155] [-Inf,79) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[162] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[169] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[176] [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79)
[183] [79, Inf) [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[190] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[197] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [79, Inf)
[204] [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[211] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79)
[218] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[225] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79)
[232] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79)
[239] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[246] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[253] [-Inf,79) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[260] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[267] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,waiting_cat <- cut(faithful$waiting, breaks=c(-Inf,79,Inf), right=FALSE)Context
StackExchange Code Review Q#6599, answer score: 10
Revisions (0)
No revisions yet.