HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Huffman compressor in Java

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
compressorhuffmanjava

Problem

(See the next iteration as well.)

I have this Java program that can en-/decode files, both text and binary. What comes to critique I want to hear anything regarding these points:

  • Performance,



  • Modularity,



  • Coding conventions,



  • Naming conventions.



Code

BitString.java

```
package net.coderodde.compression.huffman;

import java.util.Arrays;

/**
* This class implements a builder for creating bit strings.
*
* @author Rodion "rodde" Efremov
* @version 1.618 (Nov 14, 2016)
*/
public final class BitString {

/**
* The length of a {@code long} array storing the bits.
*/
private static final int DEFAULT_NUMBER_OF_LONGS = 2;

/**
* Used for more efficient modulo arithmetics when indexing a particular
* bit in a {@code long} value.
*/
private static final int MODULO_MASK = 0b111111;

/**
* Number of bits per a {@code long} value;
*/
private static final int BITS_PER_LONG = Long.BYTES * Byte.SIZE;

/**
* This field holds the actual array of long values for storing the bits.
*/
private long[] storageLongs;

/**
* Current maximum of bits this builder can store.
*/
private int storageCapacity;

/**
* Stores the number of bits this bit string builder actually represents.
We have an invariant {@code size >> Byte.SIZE (i % Long.BYTES)) & 0xff);
}

return byteArray;
}

@Override
public String toString() {
StringBuilder sb = new StringBuilder(size);

for (int i = 0; i != size; ++i) {
sb.append(readBitImpl(i) ? '1': '0');
}

return sb.toString();
}

@Override
public int hashCode() {
int hash = 0;
int fullLongs = size / BITS_PER_LONG;

for (int i = 0; i != fullLongs; ++i) {
long word = storageLongs[i];
hash += (int)((word & 0xffffffff00000000L) ^
(word & 0xffffffff));
}

int leftoverBit

Solution

Consider walking the tree to decode

For encoding, using a map is smart, as you can convert your input byte to an output bitstring in a single lookup.

However, for decoding, you don't really know when your bitstring is going to end. Your code does one (unsuccessful) lookup for each bit in the bitstring until the bitstring ends. So if 011011 mapped to 'A', you'd do six total lookups. If you instead walked the huffman tree starting at the root, then for each bit you would move to the left or right child and check whether that node was a leaf. I think that this would be faster because it would be a node traversal versus a map lookup.
Even faster using FSM

I think you can get even faster decode times using a finite state machine. There is an article here that describes one implementation.

Context

StackExchange Code Review Q#147509, answer score: 4

Revisions (0)

No revisions yet.