patterncppMinor

Selecting kids for a Christmas play with similar heights

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

sorting codereview cpp c++stackoverflow programming-challenge performance clustering

heightskidswithchristmasplayforselectingsimilar

Problem

I am doing this problem on SPOJ:

My kid's kindergarten class is putting up a Christmas play. (I hope he gets the lead role.) The kids are all excited, but the teacher has a lot of work. She has to produce costumes for a scene with \$K\$ soldiers. She wants to buy all the costumes in the same size, allowing for some small amount of length alteration to be done by the kids' parents later. So she has taken all the kids' height measurements. Can you help her select \$K\$ kids from her class of \$N\$ to play the soldier role, such that the height difference between the tallest and shortest in the group is minimized, and alternations will be easiest? Tell her what this minimum difference is.

INPUT

The first line contains the number of test cases \$T\$. \$T\$ test cases follow each containing 2 lines.

The first line of each test case contains 2 integers \$N\$ and \$K\$.
The second line contains \$N\$ integers denoting the height of the \$N\$ kids.

OUTPUT

Output \$T\$ lines, each line containing the required answer for the corresponding test case.

CONSTRAINTS

\$T \le 30\$

\$1 \le K \le N \le 20000\$

\$1 \le \text{height} \le 1000000000\$

My approach

I am first storing all the heights in the array and then sorting the array using quicksort (I also tried std::sort), and after that finding the minimum difference using sliding window (I do not exactly know the name of algorithm, I heard it is known as sliding window).

After optimizing my code to the best that I can, my code takes 0.05s to execute all test cases, but when I see the best result, it is 0.00s or 0.02s, so less than 0.05s.

By the way, I had also tested with std::sort but it was only marginally faster (0.05 s instead of 0.06 s). How can I optimize it more?

```
#include
#include
#include
int partition(int *arr, const int left, const int right) {
const int mid = left + (right - left) / 2;
const int pivot = arr[mid];
// move the

Solution

Time is mostly spent in I/O

I played with your program and found that most of the time was spent doing I/O. However, from your comments, it appears you already improved your I/O so that your best solution is now 0.02 seconds instead of 0.05 seconds.

I took your improved I/O code and took it one step further. What I did was read the entire input into a buffer using a single fread(), and then instead of using getchar_unlocked(), I just advanced a pointer through the buffer using *p++.

Why is *p++ faster than getchar_unlocked()? Here is what getchar_unlocked() does (not exactly but something close to this):

char getchar_unlocked()
{
    if (stdin.ptr >= stdin.buffer_len) {
        refill_stdin_buffer();
        stdin.ptr = 0;
    }
    return stdin.buffer[stdin.ptr++];
}

So for every character you read, it does a check to see if it needs to reload the buffer. If you read the entire input from stdin to a local buffer, you are essentially skipping and end of buffer check for each character you read.

Update: Later I found that my I/O changes didn't matter. Both yours and mine were able to achieve 0.00 time, as long as radix sort was used (see next section).

Radix sort

Radix sort is 0.02 seconds faster than std::sort for this test. I know because I tried it both ways. First I tried radix sort, which took 0.00 seconds overall. Then I replaced radix sort with std::sort and got 0.02 seconds.

In general, radix sort should be faster than quicksort for large arrays of integers. The radix sort I used operates on bytes at a time, so for 32-bit integers it uses exactly 4 rounds. On each round it does a counting pass followed by a rearranging pass. So in total, it does approximately \$8n\$ operations to sort an array of \$n\$ integers, plus a fixed overhead of 1024 operations to deal with the buckets.

From my own testing, I found that radix sort is 5x faster than std::sort for sorting arrays of 10 million integers. For 20000 integers, it is around 4x faster.

The final program, which achieved 0.00 seconds

Here is my program. You can see in the SPOJ status that I got 0.00 seconds with it. FYI, my first entry was a C program that got 0.00 seconds, but I changed it to the following C++ program:

#include 
#include 

#define MAX                20000

// 30 testcases, 11 characters per number, 20000 numbers
char buf[MAX*11*30];
int array    [MAX];
int arrayTemp[MAX];

static inline int readInt(char **pPtr)
{
    char *p   = *pPtr;
    int   ret = 0;
    char  c   = *p++;

    while (c = '0');

    *pPtr = p;
    return ret;
}

void radixsort(int *a, int len)
{
    int *space       = arrayTemp;
    int *current     = a;
    int *scratch     = space;
    int *tmp         = NULL;
    int  radixByte   = 0;
    int  i           = 0;
    int  bucket[256];

    // Iterate once per byte.
    for (radixByte=0;radixByte> shift) & 0xff]++;

        // Change bucket to be cumulative count.
        for (i=1;i=0;i--)
            scratch[--bucket[(current[i] >> shift) & 0xff]] = current[i];

        // Switch arrays
        tmp     = current;
        current = scratch;
        scratch = tmp;
    }
}

int main(void)
{
    char *p = buf;
    fread(buf, 1, sizeof(buf), stdin);

    int t = readInt(&p);
    while (t-- > 0) {
        int n = readInt(&p);
        int k = readInt(&p);

        for (int i = 0; i < n; i++)
            array[i] = readInt(&p);

        if (k == 1) {
            puts("0");
            continue;
        }

        radixsort(array, n);

        int diff = INT_MAX;
        n -= k;
        k--;
        for (int i = 0; i <= n; i++) {
            int newDiff = array[i+k] - array[i];
            if (newDiff < diff)
                diff = newDiff;
        }
        printf("%d\n", diff);
    }
    return 0;
}

Code Snippets

char getchar_unlocked()
{
    if (stdin.ptr >= stdin.buffer_len) {
        refill_stdin_buffer();
        stdin.ptr = 0;
    }
    return stdin.buffer[stdin.ptr++];
}

#include <bits/stdc++.h>
#include <cstdio>

#define MAX                20000

// 30 testcases, 11 characters per number, 20000 numbers
char buf[MAX*11*30];
int array    [MAX];
int arrayTemp[MAX];

static inline int readInt(char **pPtr)
{
    char *p   = *pPtr;
    int   ret = 0;
    char  c   = *p++;

    while (c < '0')
        c = *p++;
    do {
        ret = (ret << 3) + (ret << 1) + c - 48;
        c   = *p++;
    } while (c >= '0');

    *pPtr = p;
    return ret;
}

void radixsort(int *a, int len)
{
    int *space       = arrayTemp;
    int *current     = a;
    int *scratch     = space;
    int *tmp         = NULL;
    int  radixByte   = 0;
    int  i           = 0;
    int  bucket[256];

    // Iterate once per byte.
    for (radixByte=0;radixByte<4;radixByte++) {
        int shift = (radixByte << 3);
        memset(bucket, 0, sizeof(bucket));

        // Count how many of each byte.
        for (i=0;i<len;i++)
            bucket[(current[i] >> shift) & 0xff]++;

        // Change bucket to be cumulative count.
        for (i=1;i<256;i++)
            bucket[i] += bucket[i-1];

        // Copy from current to scratch, using bucket counts.
        for (i=len-1;i>=0;i--)
            scratch[--bucket[(current[i] >> shift) & 0xff]] = current[i];

        // Switch arrays
        tmp     = current;
        current = scratch;
        scratch = tmp;
    }
}

int main(void)
{
    char *p = buf;
    fread(buf, 1, sizeof(buf), stdin);

    int t = readInt(&p);
    while (t-- > 0) {
        int n = readInt(&p);
        int k = readInt(&p);

        for (int i = 0; i < n; i++)
            array[i] = readInt(&p);

        if (k == 1) {
            puts("0");
            continue;
        }

        radixsort(array, n);

        int diff = INT_MAX;
        n -= k;
        k--;
        for (int i = 0; i <= n; i++) {
            int newDiff = array[i+k] - array[i];
            if (newDiff < diff)
                diff = newDiff;
        }
        printf("%d\n", diff);
    }
    return 0;
}

Context

StackExchange Code Review Q#153118, answer score: 3

Revisions (0)

No revisions yet.