HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Longest lines using priority queue

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
priorityusinglongestqueuelines

Problem

Inspired by a comment on this question, I thought I'd exercise my C skills by implementing something like it using a priority queue.

The idea is to be able to invoke the program like this:


./longline 10 < /usr/share/dict/linux.words

And get it to print the 10 longest lines in the linux.words file. I chose to implement the program using getline() which is a handy POSIX-2008 function that automatically allocates memory for the line it reads and can reuse the buffer it allocates. It only reallocates when a new line is longer. On my machine, using this function (defined in stdio.h) requires the #define _GNU_SOURCE that you see at the top of the file.

The goals

The goal was primarily to implement something like a priority queue using an unordered array as a heap. I write "something like" a priority queue because this has one particular feature -- it only keeps the lowest priority items if a new item is added when the queue is already full.

I also wanted to minimize memory allocations and reallocations. Using it on the linux.words file mentioned above (which has 479828 words in it, one per line), valgrind reports:


total heap usage: 108 allocs, 108 frees, 13,136 bytes allocated

which is satisfyingly parsimonious for a file that's 4953680 bytes long.

Notes

The first entry in the array items[0] is generally not used; the top of the queue is located at items[1]. In particular, items[0] is only used as a convenient scratch space within pq_insert().

Questions

I'm interested in a general review, and in particular anywhere the code could be better written or more clearly written.

I'm also not entirely happy with the creation of the output_queue to reverse the order of the items in the queue. Is there a better way to do that?

longline.c

```
#define _GNU_SOURCE
#include
#include
#include
#include

typedef struct {
int priority;
char *data;
} item;

/*
* Swap the contents of two items.
*/
void item_swap(item a, item

Solution

Allocation one too big

In pq_create(), this allocation:

priority_queue *pq = malloc(sizeof(priority_queue) + sizeof(item) * (size + 1));


could be smaller by one item:

priority_queue *pq = malloc(sizeof(priority_queue) + sizeof(item) * size);


The priority_queue structure already contains space for one item, so you only need size more items.

Dead code

In pq_free(), this line doesn't do anything, and can be deleted:

pqueue = NULL;


The intent may have been to NULL out the pointer, but this line doesn't actually cause the pointer (in the caller) to be NULLed. It only affects the local copy inside of pq_free(), which isn't going to be used again. The compiler is probably optimizing that line out anyway, but the line may be misleading in the sense that it may suggest that the caller's copy is being NULLed when it isn't.

Make pq_adjust() do more

The current pq_adjust() function does one level of adjustment, and depends on the caller to call it in a loop. But the three call sites for this function all have the same loop, which adjusts from the top of the heap to as far down as necessary.

If you modified pq_adjust() to always start at the top of the heap and adjust as far down as necessary (using a loop), you could avoid the three loops at the call sites. It would also get rid of all the ugly *i pointer indirections.

Duplicated code

pq_pop() and pq_pop_no_delete() are almost identical. You could remove the code duplication by rewriting pq_pop() as:

void pq_pop(priority_queue * pqueue)
{
    free(pq_pop_no_delete(pqueue));
}


Reversing the heap

Instead of using a second heap to reverse the original one, you could just pop each element off the heap and store the popped data at the end of the heap where you just vacated an entry. For example:

size_t last_used = pqueue->last_used;
    for (size_t i = last_used; i; --i) {
        pqueue->items[i].data = pq_pop_no_delete();
    }
    for (size_t i = 1; i items[i].data);
        free(pqueue->items[i].data);
    }


Alternatively, you could just allocate new array to do the same thing:

size_t last_used = pqueue->last_used;
    char **result = malloc(last_used * sizeof(char *));
    for (size_t i = last_used; i; --i) {
        result[i-1] = pq_pop_no_delete();
    }
    for (size_t i = 0; i < last_used; i++) {
        printf("%s", result[i]);
        free(result[i]);
    }
    free(result);

Code Snippets

priority_queue *pq = malloc(sizeof(priority_queue) + sizeof(item) * (size + 1));
priority_queue *pq = malloc(sizeof(priority_queue) + sizeof(item) * size);
pqueue = NULL;
void pq_pop(priority_queue * pqueue)
{
    free(pq_pop_no_delete(pqueue));
}
size_t last_used = pqueue->last_used;
    for (size_t i = last_used; i; --i) {
        pqueue->items[i].data = pq_pop_no_delete();
    }
    for (size_t i = 1; i <= last_used; i++) {
        printf("%s", pqueue->items[i].data);
        free(pqueue->items[i].data);
    }

Context

StackExchange Code Review Q#131919, answer score: 3

Revisions (0)

No revisions yet.