HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMajor

Can I buff up your file?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
canfileyourbuff

Problem

Here I have a method for reading the contents of a file into a buffer and returning whether or not it is successful.

Here is what I would like reviewed:

-
Speed: Am I reading in the file as fast as possible?

-
Bugs: Are there cases where my code could fail or give unexpected behavior?

-
Syntax/Styling: How does my code look? How could I make it look better? Anything I'm doing wrong syntax-wise?

int getFileContents(const char *file, void **contents, size_t *size)
{
    // Open file for reading
    int fd = open(file, O_RDONLY);
    if (fd  0) {
        n = read(fd, buf, BUF_SIZE);
        if (n < 0) {
            close(fd);
            free(result);
            return -1;
        }

        unsigned char *tmp = (unsigned char*) realloc(result, readb + n);
        if (!tmp) {
            close(fd);
            free(result);
            return -1;
        }

        result = tmp;
        memcpy(result + readb, buf, n);
        readb += n;
        rest -= n;
    }

    close(fd);
    *contents = result;
    *size = total;

    return 0;
}

Solution


  1. Bugs



-
If the open succeeds but the lseek fails (as it might if the file is a named pipe), the function omits to close the file descriptor, thus leaking it.

-
If read reaches the end of the file (as it might if the file was truncated after the lseek), then it returns 0. This will cause the function to go into an infinite loop.

-
There's no check on the result of the second call to lseek. If this call fails, then the function will go into the infinite loop described above.

  1. Review



-
There's no specification. What does this function do? How do I call it? What does it return?

-
This is a worst-case \$\Theta(n^2)\$ algorithm, because realloc may have to copy the result each time it grows, and you grow the buffer by at most BUF_SIZE bytes each time you call realloc. It's usual in this kind of situation to grow the result exponentially, by multiplying the result size by some constant on each realloc call.

-
Even ignoring the copying due to realloc, each byte read from the file is copied again using memcpy. It would be better to realloc the result first, and then read directly into the result.

-
You go to the trouble of finding the length of the file, so why not read it all in one go?

result = malloc(total);
if (!result)
    /* handle error */
n = read(fd, result, total);
if (n != total)
    /* handle error */


-
Consider using fstat instead of lseek to determine the size of the file.

-
It's not a good idea to put multiple statements on the same line, like this:

if (fd < 0) return -1;


That's because in most debuggers you won't be able to set a breakpoint on the return statement.

-
There's repetition in the handling of the four failure cases. This makes the function difficult to modify (because you might fail to adjust one or more of the cases). The bug in §1.1 above might have been caused by this problem.

-
For reading the entire contents of a file into memory, consider using mmap.

mmap has advantages over read. (i) There's no need to allocate a buffer: the kernel maps the file directly into the process's virtual memory. (ii) mmap operates lazily: it maps the file into memory but does not actually copy any part of it until that page is touched, so programs that randomly access the file don't have to wait for the whole file to load.

mmap also has disadvantages. (i) mmap only operates on ordinary files (not pipes, sockets, or other kinds of file descriptor). (ii) Mapping lots of files creates lots of entries in the process's virtual memory table, and on some operating systems (notably OS X) this can lead to poor memory performance.

  1. Revised code



#include 
#include 
#include 
#include 

/* Map the contents of file into memory: if successful, update
 * *contents to point to the file contents, update *size to be the
 * size of the file, and return 0. Otherwise, return -1.
 */
int getFileContents(const char *file, void **contents, size_t *size)
{
    int fd = open(file, O_RDONLY);
    if (fd < 0)
        goto fail_open;

    struct stat st;
    if (fstat(fd, &st) < 0)
        goto fail_fstat;

    void *result = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (result == MAP_FAILED)
        goto fail_mmap;

    *contents = result;
    *size = st.st_size;
    close(fd);
    return 0;

fail_mmap:
fail_fstat:
    close(fd);
fail_open:
    return -1;
}


  1. Answers to comments



In comments, lesto suggests:

-

I would return a different number for every error, to help the user of the function to identify the problem and try to recover.

In this context (Unix systems programming), it's conventional for a function to return a single distinguished value (typically −1) to indicate an error, with the cause of the error being stored in errno. See for example the POSIX specification of open:


Upon successful completion, these functions shall open the file and return a non-negative integer representing the lowest numbered unused file descriptor. Otherwise, these functions shall return -1 and set errno to indicate the error.

It makes sense to stick to the Unix convention, so that the caller can apply the same kind of error handling to getFileContents as they do to other system interfaces.

-

Every error between open and close need the call to close, and leaving it inline is easier than check the list of label is correct, IMHO. It is true that you are writing some more lines.

If you write out the error-handling code inline, then your functions have length that's quadratic in the number of error cases! This is disastrous for readability and maintainability. It is all too easy to forget one of the cases, as in the bug described in §1.1 above.

An alternative to goto is to use nested if statements. In this case the code would become:

```
int fd = open(file, O_RDONLY);
if (fd >= 0) {
struct stat st;
if (fstat(fd, &st) == 0) {
void *result = mmap(NULL, st.st_size, PROT_READ, MAP

Code Snippets

result = malloc(total);
if (!result)
    /* handle error */
n = read(fd, result, total);
if (n != total)
    /* handle error */
if (fd < 0) return -1;
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>

/* Map the contents of file into memory: if successful, update
 * *contents to point to the file contents, update *size to be the
 * size of the file, and return 0. Otherwise, return -1.
 */
int getFileContents(const char *file, void **contents, size_t *size)
{
    int fd = open(file, O_RDONLY);
    if (fd < 0)
        goto fail_open;

    struct stat st;
    if (fstat(fd, &st) < 0)
        goto fail_fstat;

    void *result = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (result == MAP_FAILED)
        goto fail_mmap;

    *contents = result;
    *size = st.st_size;
    close(fd);
    return 0;

fail_mmap:
fail_fstat:
    close(fd);
fail_open:
    return -1;
}
int fd = open(file, O_RDONLY);
if (fd >= 0) {
    struct stat st;
    if (fstat(fd, &st) == 0) {
        void *result = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
        if (result != MAP_FAILED) {
            *contents = result;
            *size = st.st_size;
            close(fd);
            return 0;
        }
    }
    close(fd);
}
return -1;

Context

StackExchange Code Review Q#91728, answer score: 21

Revisions (0)

No revisions yet.