HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonModerate

What is the best approach to use in R and why?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
whythewhatanduseapproachbest

Problem

I'm starting working with R, and found some tutorials and exercises online.

I want to divide one variable in two, bigger and equal 79 and smaller 79.

Perhaps because I'm used to python, my first approach was to do something like this:

z <- numeric(length(faithful$waiting))
n = 0
for (i in faithful$waiting) {
n = 1 + n
if (i < 79) z[n] <- 1
}


But I found many tutorials that use this solution instead:

min_wait <- min(faithful$waiting)-0.1
max_wait <- max(faithful$waiting)
cutof <- c(min_wait,79,max_wait)
waiting_cat <- cut (faithful$waiting, breaks=cutof)


What is the best way to do something like this? And can someone explain why.

Thank you!

Solution

As you realize, your first approach works (it gives a result consistent with the criteria you specify), but it is not idiomatic R. Iterating over elements of a set/list/vector is idiomatic of python, and does have a place in R as well. However, what this approach misses is 2 aspects of R: inherent vectorization and the factor data type.

In R, all basic types are vectors and can hold multiple items (of the same type). A single value is just a special case of a length 1 vector. Since everything is a vector, all the standard functions are designed to operate on the whole vector at once. They are implicitly vectorized over the elements of the vector rather than needing an explicit loop (iteration or for loop) to operate on each element. So the first simplification is to eliminate the for loop over elements of faithful$waiting and just do the comparison on the whole vector.

> faithful$waiting < 79
  [1] FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
 [13]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [25]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
 [37]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
 [49] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
 [61]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
 [73] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
 [85]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [97] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
[109] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[121]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[133]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
[145]  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
[157] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
[169]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE
[181]  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
[193]  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
[205]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
[217]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
[229]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
[241]  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
[253]  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[265]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE


This brings up another aspect. faithful$waiting is of length 272, 79 is length 1. Argument recycling causes the 79 to be repeated until it is of the same length as faithful$waiting. Then the comparison is done element-wise, returning a logical variable. If you want it as a numeric (as in your first example), this can be converted directly: FALSE becomes 0 and TRUE becomes 1

> as.numeric(faithful$waiting < 79)
  [1] 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
 [38] 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1
 [75] 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1
[112] 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1
[149] 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1
[186] 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 0
[223] 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1
[260] 0 1 0 1 0 1 1 1 0 1 0 1 1


The second aspect of R I mentioned was factors. Factors are the implementation of a data type which can take on any of a predefined set of values. In some languages, these are enumerated types. Internally, they are stored as integer indexes into a vector of values (and this sometimes shows through).

You can create a factor from another vector by defining the levels it can take and, optionally, labels these levels should be displayed as. Continuing the example

```
> factor(as.numeric(faithful$waiting < 79), levels=c(1,0), labels=c("<79", "79+"))
[1] 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 79+ <79 <79 79+ <79 <79 79+
[19] <79 79+ <79 <79 <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79
[37] <79 79+ <79 79+ 79+ <79 79+ <79 <79 79+ <79 <79 79+ <79 <79 79+ <79 79+
[55] <79 79+ <79 <79 <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ <79
[73] 79+ <79 <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 79+ <79 79+ <79 79+
[91] <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[109] 79+ 79+ <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+
[127] <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[145] <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ 79+ <79 79+ <79 79+
[163] <79 <79 <79

Code Snippets

> faithful$waiting < 79
  [1] FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
 [13]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [25]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
 [37]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
 [49] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
 [61]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
 [73] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
 [85]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [97] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
[109] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[121]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[133]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
[145]  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
[157] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
[169]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE
[181]  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
[193]  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
[205]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
[217]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
[229]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
[241]  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
[253]  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[265]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
> as.numeric(faithful$waiting < 79)
  [1] 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
 [38] 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1
 [75] 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1
[112] 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1
[149] 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1
[186] 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 0
[223] 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1
[260] 0 1 0 1 0 1 1 1 0 1 0 1 1
> factor(as.numeric(faithful$waiting < 79), levels=c(1,0), labels=c("<79", "79+"))
  [1] 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 79+ <79 <79 79+ <79 <79 79+
 [19] <79 79+ <79 <79 <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79
 [37] <79 79+ <79 79+ 79+ <79 79+ <79 <79 79+ <79 <79 79+ <79 <79 79+ <79 79+
 [55] <79 79+ <79 <79 <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ <79
 [73] 79+ <79 <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 79+ <79 79+ <79 79+
 [91] <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[109] 79+ 79+ <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+
[127] <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 79+ 79+ <79 79+ <79
[145] <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ <79 <79 79+ 79+ <79 79+ <79 79+
[163] <79 <79 <79 <79 <79 79+ <79 79+ <79 <79 <79 <79 79+ 79+ <79 <79 79+ <79
[181] <79 <79 79+ 79+ <79 <79 79+ <79 79+ <79 79+ <79 <79 79+ <79 79+ 79+ <79
[199] <79 <79 <79 79+ 79+ <79 <79 <79 <79 79+ <79 79+ <79 79+ <79 <79 <79 <79
[217] <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 <79 79+ <79
[235] 79+ <79 <79 <79 79+ <79 <79 <79 79+ <79 79+ 79+ <79 79+ <79 <79 <79 79+
[253] <79 <79 79+ 79+ <79 79+ <79 79+ <79 79+ <79 79+ <79 <79 <79 79+ <79 79+
[271] <79 <79
Levels: <79 79+
> cut(faithful$waiting, breaks=c(-Inf,79,Inf), right=FALSE)
  [1] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
  [8] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
 [15] [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
 [22] [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
 [29] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79)
 [36] [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79)
 [43] [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [79, Inf)
 [50] [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
 [57] [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
 [64] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79)
 [71] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79)
 [78] [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
 [85] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
 [92] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79)
 [99] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [79, Inf)
[106] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79) [-Inf,79)
[113] [79, Inf) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[120] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [79, Inf)
[127] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[134] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[141] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[148] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[155] [-Inf,79) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[162] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[169] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[176] [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79)
[183] [79, Inf) [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[190] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[197] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [79, Inf)
[204] [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[211] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79) [-Inf,79)
[218] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[225] [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79)
[232] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79)
[239] [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf)
[246] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79) [-Inf,79) [79, Inf)
[253] [-Inf,79) [-Inf,79) [79, Inf) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79)
[260] [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,79) [-Inf,79)
[267] [-Inf,79) [79, Inf) [-Inf,79) [79, Inf) [-Inf,
waiting_cat <- cut(faithful$waiting, breaks=c(-Inf,79,Inf), right=FALSE)

Context

StackExchange Code Review Q#6599, answer score: 10

Revisions (0)

No revisions yet.