HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

ChunkerTransformStream, a transform stream to take arbitrary chunk sizes and make them consistent

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
streamchunkconsistenttakearbitraryandmakesizeschunkertransformstreamtransform

Problem

I have some code to interface with a wire protocol that requires data to be inserted into a stream at regular byte intervals. Every 8KB (or at some other definable interval), a small chunk will be inserted. To make this easy, I decided to create a transform stream that would take a flowing stream and write fixed chunk sizes. That is, this stream can be written to in any size (2KB here, 500KB there, 5 bytes next, etc.) and it will output chunks in 8KB of size every time.

var stream = require('stream');
function ChunkerTransformStream (chunkSize) {
    chunkSize = chunkSize || 8192;

    var buffer = new Buffer(0);

    var chunker = new stream.Transform({objectMode: true});
    chunker._transform = function (chunk, encoding, done) {
        buffer = Buffer.concat([buffer, chunk]);

        while (buffer.length >= chunkSize) {
            this.push(buffer.slice(0, chunkSize));
            buffer = buffer.slice(chunkSize);
        }

        done();
    }

    chunker._flush = function (done) {
        if (buffer.length) {
            this.push(buffer);
            done();
        }
    }

    return chunker;
}

module.exports = ChunkerTransformStream;


This transform stream will be used heavily in my code, having several megabit pushed through it per second. Is this the most efficient way to achieve what I want? I am most concerned about my buffer operations. It's my understanding that Buffer.concat() is very expensive, as it allocates an entirely new buffer and copies the first two to it.

Any feedback, on performance or otherwise, is welcomed.

Solution

First of all, your code looks great. It brought me to study the internal workings of Node's stream API.

The piece that you're concerned with, and rightfully so, is:

buffer = Buffer.concat([buffer, chunk]);


That's the way to do it. There are no other ways to do what you want to do without using bufferjs, which likely handles things the same way, or less efficiently.

Now, there is one optimization you can make here. By adding the totalLength parameter of Buffer.concat Buffer.concat(list, [totalLength]) you can improve efficiency by avoiding an additional loop in the function, which would need to check the new length of the new buffer.

Name Description Required? Type
list List of Buffer objects to concat Required array
totalLength Total length of the buffers when concatenated. Optional number


If totalLength is not provided, it is read from the buffers in the
list. However, this adds an additional loop to the function, so it is
faster to provide the length explicitly. - See more at:
http://www.w3resource.com/node.js/nodejs-buffer.php#sthash.qISHPlCO.dpuf

Not sure if this will help or not, but other than that, I don't see any possible optimizations.

Code Snippets

buffer = Buffer.concat([buffer, chunk]);

Context

StackExchange Code Review Q#57492, answer score: 6

Revisions (0)

No revisions yet.