HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Naive JS implementation of tidyr's gather function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
naivefunctionimplementationgathertidyr

Problem

Tidyr is an R package for creating tidy data.
(tl;dr)
In short, it provides an interface (gather) for restructuring a table into a "tidy" format. For our purposes here, it suffices to explain that each row should only contain a value for one observation. The row could contain several fields, but these are fixed factors / categorical variables (at least for the sake of this post.)

Understanding the transformation

Tidyr is intended for tabular data. However, I'm working with D3.js, in which data rows are represented as objects, not arrays.

We'd like to be able to do the following in JavaScript:

const data = [
{Factor1: 'x1', A: 1, B: 74, C: 0.3},
{Factor1: 'x2', A: 2, B: 89, C: 0.12},
{Factor1: 'x3', A: 3, B: 30, C: 0.5}
];

const fields = ['A', 'B', 'C'];
const out = gather(data, 'Factor2', 'Value', ...fields);
out.forEach(item => { console.log(item); });


...which should produce...

Object {Factor2: "A", Value: 1, Factor1: "x1"}
Object {Factor2: "B", Value: 74, Factor1: "x1"}
Object {Factor2: "C", Value: 0.3, Factor1: "x1"}
Object {Factor2: "A", Value: 2, Factor1: "x2"}
Object {Factor2: "B", Value: 89, Factor1: "x2"}
Object {Factor2: "C", Value: 0.12, Factor1: "x2"}
Object {Factor2: "A", Value: 3, Factor1: "x3"}
Object {Factor2: "B", Value: 30, Factor1: "x3"}
Object {Factor2: "C", Value: 0.5, Factor1: "x3"}


Gather

Again, gather provides an interface for performing such transformations. Here's the signature of the original function, from Tidyr's documentation:

gather(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE)


Above, key and value are the names of the two resulting columns. The column labeled by the key parameter is to take on the names of the headings of the altered columns. This is Factor3 above. Similarly, the column labeled by the value parameter is to take on the values of the same column's value... this is Value in the example above.

Anyway, the code is otherwise the same as originall

Solution

First, I would be concerned about the performance of your three consecutive map operations followed by reduce, particularly as the size of data grows in terms of number of items in array.

Right now, you have to iterate data 4 times to complete all those operations. While the 3x mapping operation probably does provide some clarity to the operations by breaking them down to simple transformation steps, I would consider creating a single map callback that could complete all steps of transformation on an item in one pass, minimizing repetitive iteration of the array. Of course you would probably want to run some performance tests for your expected use cases to understand the performance trade-offs here. It might be that a usage pattern like the following makes sense:

data.map( record => {
    // mapping step 1
    // mapping step 2
    // mapping step 3
    return mappedRecord;
}


Maybe the usage example you show is not very meaningful, but I am having a hard time understanding function signature for gather().

  • What is the difference between the two key and value labels parameters and the columns specified passed from columns array parameter?



  • The signature does not make it clear to me as to which fields in the input data are to be assigned to the Letter and Value properties (for this example) in the output array.



  • The parameter naming doesn't seem to make sense. Why is one called key_label and one called value_label when both are used as keys in the output structure?



  • From looking at this signature, how is one to determine what the logic is that is to be applied in "splitting" apart the input records?



Do the with_fields() and split_record() methods have any value outside the context of the gather() function? If not, consider nesting them inside gather as "private" functions only in context of gather();

Stylistically:

  • I don't like your use of snake_case in javascript, as it is common practice in JS world to use camelCase.



  • Consider sticking to comments before lines of code to they apply, rather than using commments at the end of the line (which I generally avoid, as I feel they make code harder to read).



  • There are a few cases where variable naming is not that meaningful - without_f, with_f. Dropping a few characters from a variable is oftentimes not worth the value you get from having clearly understandable variable names.



  • When you get a more complex or longer return operation on an arrow function, you might consider using bracket syntax and/or line breaks to make the code easier to read.



For example:

.map(([left, right]) => right
  .map(long_pair => Object.assign({}, long_pair, left)))


Could become:

.map(([left, right]) => {
    return right.map(
        long_pair => Object.assign({}, long_pair, left)
    );
});

// or
.map(
    ([left, right]) => right.map(
        long_pair => Object.assign({}, long_pair, left)
    )
 );


To me, this makes it much clearer that there is a nested mapping operation happening here. This is much easier than trying to count/balance opening closing parenthesis in your head as one might have to do when looking at original code that ends with three closing parenthesis in a row.

Code Snippets

data.map( record => {
    // mapping step 1
    // mapping step 2
    // mapping step 3
    return mappedRecord;
}
.map(([left, right]) => right
  .map(long_pair => Object.assign({}, long_pair, left)))
.map(([left, right]) => {
    return right.map(
        long_pair => Object.assign({}, long_pair, left)
    );
});

// or
.map(
    ([left, right]) => right.map(
        long_pair => Object.assign({}, long_pair, left)
    )
 );

Context

StackExchange Code Review Q#145529, answer score: 3

Revisions (0)

No revisions yet.