patternjavascriptMinor
Huffman code generator in Typescript
Viewed 0 times
codetypescripthuffmangenerator
Problem
Write a program that takes any input text and produces both a
frequency table and the corresponding Huffman code.
Take approximately 360 words from any English document as your input
text. Ignore all blanks, all punctuation marks, all special symbols.
Create an input file with this input text.
Construct the frequency table according to the input text read from
the file, in the form:
The Frequency's MUST be listed, in order, from largest (at the
top) to smallest (at the bottom).
Only the BELOW Tablet Format will be accepted: Letter Comma Space
Percentage
Example: A, 2.5%
symbol frequency
A, .
. .
. .
m, .
. .
. .
7, .
Then, using the Huffman algorithm, construct the optimal prefix
binary code for the table.
Can somebody please review and suggest any needed modifications to optimize the performance?
```
import * as fs from 'fs';
import * as path from 'path';
import * as readline from 'readline';
class HuffmanNodeObject {
private _character: string = '';
private _frequency: number = 0;
private _binaryCode: string = '';
private _totalBits: number = 0;
private _leftNode: HuffmanNodeObject = null;
private _rightNode: HuffmanNodeObject = null;
public getCharacter(): string {
return this._character;
}
public setCharacter(value: string): void {
this._character = value;
}
public getFrequency(): number {
return this._frequency;
}
public setFrequency(value: number): void {
this._frequency = value;
}
public getBinaryCode(): string {
return this._binaryCode;
}
public setBinaryCode(value: string): void {
this._binaryCode = value;
}
public getTotalBits(): number {
return this._totalBits;
}
public setTotaBits(value: number): void {
this._totalBits = value;
}
public getLeftNode(): HuffmanNodeObject {
retur
frequency table and the corresponding Huffman code.
Take approximately 360 words from any English document as your input
text. Ignore all blanks, all punctuation marks, all special symbols.
Create an input file with this input text.
Construct the frequency table according to the input text read from
the file, in the form:
The Frequency's MUST be listed, in order, from largest (at the
top) to smallest (at the bottom).
Only the BELOW Tablet Format will be accepted: Letter Comma Space
Percentage
Example: A, 2.5%
symbol frequency
A, .
. .
. .
m, .
. .
. .
7, .
Then, using the Huffman algorithm, construct the optimal prefix
binary code for the table.
Can somebody please review and suggest any needed modifications to optimize the performance?
```
import * as fs from 'fs';
import * as path from 'path';
import * as readline from 'readline';
class HuffmanNodeObject {
private _character: string = '';
private _frequency: number = 0;
private _binaryCode: string = '';
private _totalBits: number = 0;
private _leftNode: HuffmanNodeObject = null;
private _rightNode: HuffmanNodeObject = null;
public getCharacter(): string {
return this._character;
}
public setCharacter(value: string): void {
this._character = value;
}
public getFrequency(): number {
return this._frequency;
}
public setFrequency(value: number): void {
this._frequency = value;
}
public getBinaryCode(): string {
return this._binaryCode;
}
public setBinaryCode(value: string): void {
this._binaryCode = value;
}
public getTotalBits(): number {
return this._totalBits;
}
public setTotaBits(value: number): void {
this._totalBits = value;
}
public getLeftNode(): HuffmanNodeObject {
retur
Solution
Starting point
Unfortunately, I am not familiar enough with node.js to make high confidence claims about the code. My guess is that I/O operations are more likely to be the bottleneck in the code rather anything else. Even the
Experiment with the way the data is being read. Now the code does this:
while (null !== (_chunk = _readable.read(1)))
but what if the read asked for a block of characters instead of a single character, and then iterated through the characters one-by-one? Usually, it is way more efficient. However, the documentation says that the data is being read from the buffer, so there may not be any performance improvement at all. I'd still give it a try for the sake of curiosity.
Another thing which might be slow is
Minor performance improvement
There's no need in three
Possible bug
This seems to be a bug, but could be that I simply misunderstand the intent. The construction
Getters/setters in HuffmanNodeObject
In my humble opinion, they are not providing any value at all. It's okay to expose the fields (
Unfortunately, I am not familiar enough with node.js to make high confidence claims about the code. My guess is that I/O operations are more likely to be the bottleneck in the code rather anything else. Even the
GenerateHuffMannTree does not look too heavy computationally.Experiment with the way the data is being read. Now the code does this:
while (null !== (_chunk = _readable.read(1)))
but what if the read asked for a block of characters instead of a single character, and then iterated through the characters one-by-one? Usually, it is way more efficient. However, the documentation says that the data is being read from the buffer, so there may not be any performance improvement at all. I'd still give it a try for the sake of curiosity.
Another thing which might be slow is
console.[log|error]. Again, I'm not sure whether it's really expensive in node.js but it might be, since it deals with I/O. Try benchmarking your program without any console output invocation. If it works faster, you may collect all the messages in memory while doing the job, and dump them all at once when the program is done (since your input is really small -- 360 words).Minor performance improvement
There's no need in three
ifs, and try-catches in SortCollectionInDescending. Assuming that all node objects in the provided array are in a good shape, you can sort using a fat arrow function that relies on simple subtraction operation. This approach is very commonly used for comparison.private SortCollectionInDescending(huffMannNodeCollection: Array): Array {
return huffMannNodeCollection.sort((firstNode, secondNode) => secondNode.getFrequency() - firstNode.getFrequency());
}Possible bug
This seems to be a bug, but could be that I simply misunderstand the intent. The construction
a += a + b is strange and not idiomatic (usually, we either see a += b or a = a + b).this._totalCodeLength += this._totalCodeLength + (_tempHuffmannObject.getFrequency() * _tempHuffmannObject.getBinaryCode().length);Getters/setters in HuffmanNodeObject
In my humble opinion, they are not providing any value at all. It's okay to expose the fields (
_character, _frequency, etc.) and treat this class as a value type (or a "structure" if you will). This will allow removing the get/set, thus shortening the code by about 50 lines. The class works as data holder and has absolutely no logic, therefore no need to complicate things. I don't think this code change will lead to any noticeable performance gains.Code Snippets
private SortCollectionInDescending(huffMannNodeCollection: Array<HuffmanNodeObject>): Array<HuffmanNodeObject> {
return huffMannNodeCollection.sort((firstNode, secondNode) => secondNode.getFrequency() - firstNode.getFrequency());
}this._totalCodeLength += this._totalCodeLength + (_tempHuffmannObject.getFrequency() * _tempHuffmannObject.getBinaryCode().length);Context
StackExchange Code Review Q#145637, answer score: 2
Revisions (0)
No revisions yet.