HiveBrain v1.2.0
Get Started
← Back to all entries
patternCritical

Huffman encoding: why is there no need for a separator?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
whyhuffmanneedseparatorforencodingthere

Problem

Char        Code
====        ====
E           0000
i           0001
y           0010
l           0011
k           0100
.           0101
space       011
e           10
r           1100
s           1101
n           1110
a           1111


Original text:


Eerie eyes seen near lake

Encoded:

0000101100000110011100010101101101001111101011111100011001111110100100101

Why is there no need for a separator in the Huffman encoding?

Solution

You don't need a separator because Huffman codes are prefix-free codes (also, unhelpfully, known as "prefix codes"). This means that no codeword is a prefix of any other codeword. For example, the codeword for "e" in your example is 10, and you can see that no other codewords begin with the digits 10.

This means that you can decode greedily by reading the encoded string from left to right and outputting a character as soon as you've seen a codeword. For example, 0, 00 and 000 don't code anything so you keep reading bits. When you read 0000, that encodes "E" and, because the code is prefix-free, you know there's no other codeword 0000x, so you can now output "E" and start to read the next codeword. Again, 1 doesn't encode anything but 10 encodes "e". No other codewords begins with "10", so you can output "e". And so on.

Context

StackExchange Computer Science Q#56615, answer score: 53

Revisions (0)

No revisions yet.