HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Decompression implementation II

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationdecompressionstackoverflow

Problem

Here is an update to my previous thread which you can find here. Again, I'd very much appreciate any advice or comments about the structure/logic of the program, style of the code, or reusability of the features.

The fread_twelve function has been heavily modified to deal with detecting and reading the last entry, which is padded from 12 to 16 bits. Let me know what you think!

I'm also quite eager to hear how I can balance the heap. Valgrind says I have "778 allocs, 354 frees" on one run, but I thought I'd freed all my allocations!

/**
 *  file: lzw.h
 *
 *  function declarations for lempel ziv-welch decompression algorithm.
 */

#ifndef LZW_H
#define LZW_H

#include 
#include 

typedef uint8_t byte;

/**
 *  Frees all blocks between 'from' and 'to' (inclusive) in the dict.
 */
void free_dict (byte* dict[], int from, int to);

/**
 *  Enters the first 256 ASCII values into dict, and their respective sizes into sizes.
 */
void load_dict_defaults (byte* dict[], int sizes[]);

/**
 *  Reads 12 bits from source and returns them as an int.
 */
int fread_twelve(FILE* source);

/**
 *  LZW decompression of source to dest.
 *  Returns 0 if successful, a positive integer otherwise.
 */
int decompress (FILE* source, FILE* dest);

#endif // LZW_H


```
/**
* file: decompress.c
*
* implementation of lempel ziv-welch decompression algorithm using 12-bit fixed-width compression format.
*
* usage: decompress source output
*/

#include "lzw.h"
#include
#include

int main (int argc, char* argv[])
{
// check for correct number of args
if (argc != 3)
{
printf("Usage: decompress source output\n");
return 1;
}

// open compressed file
FILE* source = fopen(argv[1], "rb");
if (source == NULL)
{
printf("Error: cannot open %s\n", argv[1]);
return 1;
}

// open destination file
FILE* dest = fopen(argv[2], "wb");
if (dest == NULL)
{
printf("Error: cannot open %s\n", argv[2]);

Solution

A few recommendations you might find useful:

Cleanup the global/static data:

Right now your program make use of some global/static data. E.g.: dictionary, dict_entry_sizes, prevBytes and left (anything else that I've missed?).

This poses a few limitations. For instance, your decompression routines cannot be run in parallel, since there is only one shared dictionary and friends, so running that code in different threads would result in data races. A good compression library should definitely be suitable for concurrent use!

Another potential issue is that it would be much harder to expand your code to support multiple compressors that can be paused and resumed, again because they would all share the same common global variables.

To improve this, you could make use of a context structure for your compressor. If you haven't experimented with structs in C yet, this should be a great opportunity. Move all the global and static variables into a structure declared it in your header file. Example:

struct LzwCompression
{
    bool left;
    byte prevBytes[2];
    byte* dictionary[4096];
    int dict_entry_sizes[4096];
};


Now each of the decompression functions must take a pointer to this context struture and source the variables from it, instead of reading the globals. Example with decompress():

int decompress (LzwCompression* lzw, FILE* source, FILE* dest)
{
    load_dict_defaults(lzw->dictionary, lzw->dict_entry_sizes);
    lzw->left = true;
    ...


And the caller would then provide an instance of this structure to the decompression functions:

LzwCompression lzw;
decompress(&lzw, sourceFile, destFile);

// You can have as many 'LzwCompressions' as you need, without the danger of they stepping on each other's toes.


After this change is applied, be sure to properly initialize the the structure. Your code probably relies on the globals being zero initialized. The compiler does that for you for global variables, but a struct declared at function scope is not cleared by default. So you might memset() the LzwCompression instance either on main() or right at the top of decompress(). Also set the left field to true after memseting the structure, since that was the starting value of the global originally.

Hide implementation details:

Internal functions that are only relevant to the implementation in the .c file should not be exposed in your header file. Keep header files as lean and mean as you can to streamline compile times. I believe the only function the user of your decompression cares about is decompress(), so remove the prototypes of the other ones from the header file and place those functions at the top of your .c file, marking them as static. In this case the static keyword means that the function is local, so the linker doesn't export it. It is not quite the same meaning of a static variable, so you don't have to worry with shared/global data.

// lzw.c

static void load_dict_defaults (byte* dict[], int sizes[])
{
    ...
}

static void free_dict (byte* dict[], int from, int to)
{
    ...
}

.. etc ...


Odds and ends:

-
I see that you already use the bool type, so extend that to the return type of functions that only have two possible outcomes, success (true) and failure (false). Returning integer error codes is error prone for the programmer since there isn't a single agreement or convention on which values should be returned. A boolean is much more obvious of its meaning.

-
Go easy on the magic numbers. You have a lot of repeated literal constants out there that could be consolidated into a few named constants. #define and enum are the traditional C ways of doing that, but I'd instead recommend preferring static consts whenever possible to gain some type safety.

-
You correctly use stderr in your decompression routines to print errors, but in main() all error reporting is done with printf, which outputs to stdout. Error output should always be sent to stderr, so that users can easily filter the program output. Replace those error prints with stderr. You might also consider adding the output of strerror() to the messages involving system errors (like fopen() and friends).

Well, that's all the juice I have! Compression is not my field, so I don't know how you can improve the algorithm. Hopefully someone else will provide some comments on that.

As for the memory leaks you mention, I haven't tested, sorry. If the output of Valgrind is not helping you much, you might try the Clang LeakSanitizer which is a very powerful tool with a super detailed error output that should make it very easy to find the source of the leaks.

EDIT: In a second thought, I think I do know where your memory leaks are:

```
void load_dict_defaults (byte* dict[], int sizes[])
{
byte* temp;
byte c = 0;
for (int i = 0; i < 256; i++)
{
temp = malloc(1); // <-- is this ever freed somewhere?
te

Code Snippets

struct LzwCompression
{
    bool left;
    byte prevBytes[2];
    byte* dictionary[4096];
    int dict_entry_sizes[4096];
};
int decompress (LzwCompression* lzw, FILE* source, FILE* dest)
{
    load_dict_defaults(lzw->dictionary, lzw->dict_entry_sizes);
    lzw->left = true;
    ...
LzwCompression lzw;
decompress(&lzw, sourceFile, destFile);

// You can have as many 'LzwCompressions' as you need, without the danger of they stepping on each other's toes.
// lzw.c

static void load_dict_defaults (byte* dict[], int sizes[])
{
    ...
}

static void free_dict (byte* dict[], int from, int to)
{
    ...
}

.. etc ...
void load_dict_defaults (byte* dict[], int sizes[])
{
    byte* temp;
    byte c = 0;
    for (int i = 0; i < 256; i++)
    {
        temp = malloc(1); // <-- is this ever freed somewhere?
        temp[0] = c++;
        dict[i] = temp;
        dict_entry_sizes[i] = 1;
    }
}

Context

StackExchange Code Review Q#98331, answer score: 5

Revisions (0)

No revisions yet.