HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Type-agnostic BubbleSort in C

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
typebubblesortagnostic

Problem

I'm trying out different sorting algorithms for learning purposes, here I'm doing a "type-agnostic" bubbleaort.

I modeled the function signature after the standard library's qsort. I know the bubblesort algorithm in general is not that fast, but I'm looking for a review in particular about the memory management.

In my implementation I'm accepting a void pointer to an array, the size of the array, the size of the single elements and the pointer to the comparison function. Then I convert the void pointers to char pointers, and I do manually the pointer arithmetics (I avoided nonstandard arithmetics on void *).

The part I'm not very happy about is the need for a buffer (dynamically allocated), and the use of memcpy to swap the memory of the two elements. I'd love to know if that part can be improved even slightly.

I'm using a pointer to the beginning and the end of the array, instead of using indexes with [] bracket notation.

Also, what about the if (buffer == NULL) check and "halting" with exit()? Is there a more elegant way to fail here?

I also could use one variable instead of two (current and left), but doing one more pointer arithmetic in the inner loop.

Here's my current bubblesort implementation:

```
/**
* Bubblesort algorithm. Compatible with any data type.
* @param base A void pointer to the array's first element.
* @param num The number of elements in the array.
* @param size The size of the data type.
* @param compar A pointer to the comparison function.
*/
void bubblesort(void* base, size_t num, size_t size,
int (compar)(const void, const void*)
) {
const char arr = (const char ) base;
char buffer, left, current, right;

if (arr != NULL && size > 0 && num > 1) {
buffer = (char *) malloc(size);

if (buffer == NULL) {
fprintf(stderr, "Out of memory.");

exit(EXIT_FAILURE);
}

right = (char ) arr + num size;
while (right > arr) {

Solution

Error handling

Some would argue that it's unacceptable to exit() in case of an "out of memory" error occurring in a library function. I would invite them to tell me how to actually recover in those situations, because I doubt many "well-designed" applications do so.

That said, many libraries indeed have the courtesy of telling the caller that they can't complete their task, rather than killing the process. For a language like C (which lacks support for sum types and exceptions), I'm currently undecided between a few approaches, but I'd always make sure that the caller can distinguish between success and failure by a simple inspection of the return value.

The easiest approach would be to change the return value of bubblesort to a library-specific enum of error codes, which includes exactly one 'success' value (in C libraries, the common value would be 0) and at least one 'generic failure' value. If you want your users to be able to distinguish between failure reasons, just add more reason to the enum.

Once you have done this, you can also explicitly signal for other error conditions (arr == 0, size == 0), if you consider them errors.

Note: for some pet projects, I used to just return a bool, where true would mean success. Since I realised that this runs counter to the UNIX tradition of "zero means success", I no longer use that in public APIs.

Swapping

By introducing a memswap function

void memswap(void *restrict left, void *restrict right, size_t length);


we can move a lot of logic out of bubblesort itself. This also means we would move the malloc and free to it, which seems like a bad idea for a function in the inner loop.

But don't worry: you can implement memswap in any way you like. For instance, memswap could swap the data between aaaa and bbbb byte-by-byte:

void memswap(
    void *restrict left,
    void *restrict right,
    size_t length
) {
    char *left2 = (char *)left;
    char *right2 = (char *)right;

    for (size_t cursor = 0; cursor < length; cursor++) {
        char tmp = left2[cursor];
        right2[cursor] = left2[cursor];
        left2[cursor] = tmp;
    }
}


Now that you have no more malloc, you won't have to deal with the possibility of it returning a null pointer.

Of course, a robust memswap would not ever cause any UB (which could imply exit()), so you should add some checks for null pointers here. Whether that's an error condition, or that it causes memswap to do nothing is up to you as a library designer. The same goes for the check that left and right don't overlap.

Other functional remarks

  • You forgot a trailing newline in your error message.



  • Checking for the cases num == 0 or num == 1 is an attempt at optimisation (right?). I wouldn't do this, unless a benchmark shows that clients of the library are getting significant poor performance because of it (but then it might be a better idea to figure out why these calls are made to begin with).



-
In the line

right = (char *) arr + num * size;


you should definitely protect yourself against a potential integer overflow in num * size.

Stylistic remarks

  • Don't cast malloc's return value.



  • I assume that this isn't all code, but I want to clarify that the structure of files matters. My specific remark here would be that I always make a forward declaration to the functions I define.



-
Positive conditions, even when negated, are easier to understand than negative ones. For instance, the condition in

if (arr != NULL && size > 0 && num > 1) {


takes me more time to understand than

if (!(arr == NULL || size == 0 || num <= 1)) {


-
In terms of code real estate, vertical size is cheaper than horizontal size. With that, I mean that I think a failure check like

if (!(arr == NULL || size == 0 || num <= 1)) {


is more readable (and more VCS-friendly ;)) when it's split into multiple blocks, or that the clauses in the condition at least get their own line. Yes, your code grows in length, but I'd prefer boring-looking code that's obvious over "elegant" short code that takes effort to get through (and I say that as a Haskell user).

  • As of C99, 0 is the (canonical) null pointer constant (meaning that when you convert a constant integer expression with value 0 to a pointer, you get a null pointer; the actual "address" of that null pointer is implementation-defined, though). NULL is still valid, but I wouldn't use it.



  • const char arr is confusing, as you know for a fact that you will modify arr. Just use char arr, avoiding pointer casts later on.



  • The comments around the swap are not particularily helpful. I prefer many functions and named constants over comments and magic.



  • You could turn either loop into a for loop, but I recommend to keep this as a while here. The for-notation only helps readability when it concerns a single iterator, and has no performance benefit over while.



  • Thank you for

Code Snippets

void memswap(void *restrict left, void *restrict right, size_t length);
void memswap(
    void *restrict left,
    void *restrict right,
    size_t length
) {
    char *left2 = (char *)left;
    char *right2 = (char *)right;

    for (size_t cursor = 0; cursor < length; cursor++) {
        char tmp = left2[cursor];
        right2[cursor] = left2[cursor];
        left2[cursor] = tmp;
    }
}
right = (char *) arr + num * size;
if (arr != NULL && size > 0 && num > 1) {
if (!(arr == NULL || size == 0 || num <= 1)) {

Context

StackExchange Code Review Q#112125, answer score: 7

Revisions (0)

No revisions yet.