patterncppModerate
Huffman decoding for video
Viewed 0 times
videohuffmanfordecoding
Problem
I've been trying to implement a fast Huffman decoder in order to encode/decode video. However, I'm barely able to decode a 1080p50 video using my decoder. On the other hand, there are lots of codecs in ffmpeg that entropy decode 4-8 times faster.
I have been trying to optimize and profile my code, but I don't think I can get it to run much faster. Does anyone have any suggestion as to how one can optimize Huffman decoding?
My profiler says my application is spending most of the time in the following code:
Here is the entire code:
```
void decode_huff(void input, uint8_t dest)
{
struct node
{
node* children; // 0 right, 1 left
uint8_t value;
bool step;
};
CACHE_ALIGN node nodes[512] = {};
node* nodes_end = nodes+1;
auto data = reinterpret_cast(input);
size_t table_size = *(data++); // Size is first 32 bits.
size_t num_comp = *(data++); // Data size is second 32 bits.
bit_reader table_reader(data);
unsigned char n_bits = ((table_reader.next_bit() stack;
stack.push(nodes); // "nodes" is root
while(!stack.empty())
{
node* ptr = stack.top();
stack.pop();
if(table_reader.next_bit())
{
ptr->step = true;
ptr->children = nodes->children;
for(int n = n_bits; n >= 0; --n)
ptr->value |= table_reader.next_bit() children = nodes_end++;
nodes_end++;
stack.push(ptr->children+0);
stack.push(ptr->children+1);
}
}
// Decode huffman-data
// THIS IS THE SLOW PART
auto huffman_data = reinterpret_cast(input) + (table_size+32)/32;
size_t data_size = *(huffman_data++); // Size is
I have been trying to optimize and profile my code, but I don't think I can get it to run much faster. Does anyone have any suggestion as to how one can optimize Huffman decoding?
My profiler says my application is spending most of the time in the following code:
current = current->children + data_reader.next_bit();
*ptr = current->value;
ptr = ptr + current->step;Here is the entire code:
```
void decode_huff(void input, uint8_t dest)
{
struct node
{
node* children; // 0 right, 1 left
uint8_t value;
bool step;
};
CACHE_ALIGN node nodes[512] = {};
node* nodes_end = nodes+1;
auto data = reinterpret_cast(input);
size_t table_size = *(data++); // Size is first 32 bits.
size_t num_comp = *(data++); // Data size is second 32 bits.
bit_reader table_reader(data);
unsigned char n_bits = ((table_reader.next_bit() stack;
stack.push(nodes); // "nodes" is root
while(!stack.empty())
{
node* ptr = stack.top();
stack.pop();
if(table_reader.next_bit())
{
ptr->step = true;
ptr->children = nodes->children;
for(int n = n_bits; n >= 0; --n)
ptr->value |= table_reader.next_bit() children = nodes_end++;
nodes_end++;
stack.push(ptr->children+0);
stack.push(ptr->children+1);
}
}
// Decode huffman-data
// THIS IS THE SLOW PART
auto huffman_data = reinterpret_cast(input) + (table_size+32)/32;
size_t data_size = *(huffman_data++); // Size is
Solution
You seem to work on single bit basis. This is very slow. You have to combine several bits together, look up the appropriate pattern in the table and then continue from there (if necessary) in the tree.
You can see an outline to this solution presented here http://www.gzip.org/algorithm.txt. Of course, googling for "fast huffman decoding" reveals several papers on that subject as well. Reading them might be worthy as well.
You can see an outline to this solution presented here http://www.gzip.org/algorithm.txt. Of course, googling for "fast huffman decoding" reveals several papers on that subject as well. Reading them might be worthy as well.
Context
StackExchange Code Review Q#4479, answer score: 13
Revisions (0)
No revisions yet.