patterncppMinor
Optimize YUV channel splitting function
Viewed 0 times
yuvfunctionsplittingoptimizechannel
Problem
The input pointer
This method needs optimization. If anyone is willing to help, please do.
data contains the data that needs to be split into different arrays and put in yuvInput. Each pixel is 32 bytes 4 8. Note the (j*4) to break it up per pixel segment. The 4th byte is the alpha channel which gets skipped (the reason there is no idx+3).This method needs optimization. If anyone is willing to help, please do.
void SplitYUVPlanes(int width, int height, unsigned char *data, int size, unsigned char *yuvInput[3])
{
// live input *data is YUV444 Packed
// Conversion from 444 Packed -> 444 Planar
int index = 0;
int srcStride = size;
// need to flip image from bottom-up to top-down
int revheight = height - 1;
unsigned char* pLuma = yuvInput[0];
unsigned char* pChromaU = yuvInput[1];
unsigned char* pChromaV = yuvInput[2];
for (int i = 0; i < height; ++i)
{
// read bottom line first
int line = (revheight - i) * srcStride;
for (int j = 0; j < width; ++j)
{
int idx = line + (j * 4);
pLuma[index] = data[idx + 2]; //Y
pChromaV[index] = data[idx + 1]; //V
pChromaU[index] = data[idx + 0]; //U
index++;
}
}
}Solution
Your multiplication expressions aren't ideal; instead (which might be faster) you could add 4 and subtract srcStride from the previous value at the top of each loop.
Apart from that (and this is just a guess) I guess some other things might in theory make this faster.
Apart from that (and this is just a guess) I guess some other things might in theory make this faster.
- Have three loops: write to Luma in the 1st loop, then write to ChromaV in the 2nd loop, and write to ChromaU in the 3rd loop.
- Calibrate the length of (number of bytes moved in) each loop, so that the chunk of input
datastays in cache for each of the three loops
- Use some kind of read-ahead, to ensure that the byte is read before it needs to be written, for example:
- Read 1st byte
- Read 2nd byte
- Read 3rd byte
- Read 4th byte
- Write 1st byte
- Read 5th byte
- Write 2nd byte
- etc.
- Read 4 or 8 bytes at a time, for example into a struct which contains 4 or 8 byte-fields and is a union with an int32 or int64 field.
Context
StackExchange Code Review Q#41461, answer score: 2
Revisions (0)
No revisions yet.