HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Optimize YUV channel splitting function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
yuvfunctionsplittingoptimizechannel

Problem

The input pointer data contains the data that needs to be split into different arrays and put in yuvInput. Each pixel is 32 bytes 4 8. Note the (j*4) to break it up per pixel segment. The 4th byte is the alpha channel which gets skipped (the reason there is no idx+3).

This method needs optimization. If anyone is willing to help, please do.

void SplitYUVPlanes(int width, int height, unsigned char *data, int size, unsigned char *yuvInput[3])
{
    // live input *data is YUV444 Packed
    // Conversion from 444 Packed -> 444 Planar
    int index = 0;
    int srcStride = size;

    // need to flip image from bottom-up to top-down
    int revheight = height - 1;

    unsigned char* pLuma = yuvInput[0];
    unsigned char* pChromaU = yuvInput[1];
    unsigned char* pChromaV = yuvInput[2];

    for (int i = 0; i < height; ++i)
    {
        // read bottom line first
        int line = (revheight - i) * srcStride;

        for (int j = 0; j < width; ++j)
        {
            int idx = line + (j * 4);
            pLuma[index] = data[idx + 2]; //Y
            pChromaV[index] = data[idx + 1]; //V
            pChromaU[index] = data[idx + 0]; //U
            index++;
        }
    }
}

Solution

Your multiplication expressions aren't ideal; instead (which might be faster) you could add 4 and subtract srcStride from the previous value at the top of each loop.

Apart from that (and this is just a guess) I guess some other things might in theory make this faster.

  • Have three loops: write to Luma in the 1st loop, then write to ChromaV in the 2nd loop, and write to ChromaU in the 3rd loop.



  • Calibrate the length of (number of bytes moved in) each loop, so that the chunk of input data stays in cache for each of the three loops



  • Use some kind of read-ahead, to ensure that the byte is read before it needs to be written, for example:



  • Read 1st byte



  • Read 2nd byte



  • Read 3rd byte



  • Read 4th byte



  • Write 1st byte



  • Read 5th byte



  • Write 2nd byte



  • etc.



  • Read 4 or 8 bytes at a time, for example into a struct which contains 4 or 8 byte-fields and is a union with an int32 or int64 field.

Context

StackExchange Code Review Q#41461, answer score: 2

Revisions (0)

No revisions yet.