HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

File chunk buffer for Windows programs

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
chunkfileforwindowsprogramsbuffer

Problem

The purpose of the ChunkBuffer code below is to designate a "chunk" from a given input file and to loop that chunk (if needed).

Its operation is similar to the following pseudocode, with the important difference that the real code uses a heap-allocated buffer for performance:

chunk c;

chunk_init(&c, input_file, begin_offset, size_of_chunk);

while (/ still need to read bytes /)
{
byte b = chunk_next_byte(&c); // may loop to
begin_offset
}


The ChunkBuffer code is non-portable and meant for use in Windows programs, and was written in a style close to that of the Windows API.

My main goals regarding this code review:

  • Find coding mistakes (incorrect buffering, memory leaks, etc.)



  • Enhance the Windows API style of the code.



`// enable Unicode support within the Windows API
#define UNICODE
#define _UNICODE

// skip the inclusion of certain internal headers
#define WIN32_LEAN_AND_MEAN

#include

typedef struct
{
BOOL fError; ///hHeap = hHeap;
pCB->hFile = hFile;
pCB->liOffset = liOffset;
pCB->liLength = liLength;
pCB->cbBufferSize = cbBufferSize;
pCB->fError = FALSE;
pCB->lpbBuffer = NULL;
pCB->index = 0;
pCB->cbAvailable = 0;
pCB->liRemaining = liLength;
}

///
/// @brief Frees the resources allocated by a ChunkBuffer.
/// @note This function does not close the handle of the file
/// associated with the ChunkBuffer; this is on purpose.
/// @param [in,out] pCB ChunkBuffer to deinitialize.
///
void ChunkBuffer_Free(PCHUNKBUFFER pCB)
{
if (pCB->lpbBuffer != NULL)
HeapFree(pCB->hHeap, 0, pCB->lpbBuffer);

pCB->hHeap = INVALID_HANDLE_VALUE;
pCB->hFile = INVALID_HANDLE_VALUE;
pCB->lpbBuffer = NULL;
}

///
/// @brief Returns the ChunkBuffer's error flag.
/// @param [in] pCB ChunkBuffer to inspect.
/// @returns Whether or not the ChunkBuffer is in an error state.
/// @retval TRUE

Solution

This is a good exercise! I've actually had to implement buffering like this before, and I believe that most implementations of the standard library do something similar for file reading and writing by default. Below are my thoughts on how to improve it.

Naming

As always, naming is one of the most important ways you can make code clearer. For the most part your names are pretty good, but one that had me scratching my head a bit was lilength. What is it the length of? It appears to be the length of the chunk the caller wants to read from. If that's the case, I'd name it liChunkLength, and maybe even liNumChunkBytes to make it clear what the units are.

Likewise, I don't like the name of index. First, it doesn't have the cb prefix that the other SIZE_T variables do. Second, is it an index into the file, the chunk, or the buffer? In this case, it's an index into the buffer, so maybe call it cbBufferIndex?

Handling Errors

It seems really odd to me that the errors are stored in the structure instead of being returned by the functions. Are there other Windows functions that work that way? I would not generally do it that way.

And why does the caller need to call a function to determine whether there's an error when it's just a field in their struct? If I were a developer using this code, I would just look at the struct and not call a function.

Why is the error a BOOL? That makes it much harder for a caller to know what went wrong. Looking at the code I see several different classes of errors that are possible: not able to allocate memory, not able to set the file pointer, not able to read from the file, etc.

I would make an enum for the errors and return an error from every function in which an error can occur, and I'd remove the fError field from the struct. maybe something like this:

typedef enum CB_Error {
    CBE_OUT_OF_MEMORY = 100,
    CBE_INVALID_FILE_HANDLE = 101,
    CBE_UNABLE_TO_SET_FILE_POINTER = 102,
    // ...etc.
} CB_Error;


Macros

In general you should avoid macros for anything other than simple named constants. (And with modern C you should use const instead for that.) See here for just one simple example of why macros are problematic.

But if you are going to use macros, it's traditional to #define them at the top of the source rather than inside a function. Are you likely to have another macro with the same name later in the file that you need to #define it in the function and then #undef it in the same function? If so, that's a problem.

Resource Allocation Is Initialization

There's a helpful pattern called RAII. It stands for "Resource Acquisition Is Initialization.". What it means is that you should allocate any resources you need to use a structure or object when you initialize it. In your case, I'd apply it to ChunkBuffer_Init(). I would allocate the memory for pCB->lpbBuffer in ChunkBuffer_Init() instead of in ChunkBuffer_NextByte(). It seems silly that you can successfully initialize a chunk buffer only to fail to get the first byte because there was no memory to allocate the buffer. Moving the allocation will make ChunkBuffer_NextByte() cleaner and smaller.

Performance

Reading a single byte at a time from a buffer is very inefficient and painful to use. As a caller, I'd much rather pass in a buffer of some length and have it fill the entire buffer at once, rather than having to read a single byte at a time. Think about the overhead of checking to see if you've reached the end of the chunk buffer. You have to do that check on every single byte that will be read from the file. If a caller can pass in a buffer that's 100 bytes, or 1000 bytes, those checks will happen 1/100th or 1/1000th as often, meaning better performance!

Code Snippets

typedef enum CB_Error {
    CBE_OUT_OF_MEMORY = 100,
    CBE_INVALID_FILE_HANDLE = 101,
    CBE_UNABLE_TO_SET_FILE_POINTER = 102,
    // ...etc.
} CB_Error;

Context

StackExchange Code Review Q#152599, answer score: 4

Revisions (0)

No revisions yet.