HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMinor

External bottom up merge sort too slow

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mergetooslowbottomsortexternal

Problem

My external sort function works but it is way too slow. I think the bottleneck is the merge function. How can I make it faster. Is there a better merge function I can use for merging into a file using random access file objects and buffers?

private static void merge(String a, String b,
int l, int m, int r)throws FileNotFoundException, IOException
{
    RandomAccessFile a1f = new RandomAccessFile(a,"rw");
    RandomAccessFile b1f = new RandomAccessFile(b,"rw");
    a1f.seek(l*4);
    b1f.seek(l*4);

    for(int i = l; i<m;i++){
        a1f.seek(i*4);
        b1f.seek(i*4);
        int num = a1f.readInt();
        b1f.writeInt(num);
    }

    for(int j=m; j<r; j++){
        a1f.seek((m + r - j - 1)*4);
        b1f.seek(j*4);
        int num = a1f.readInt();
        b1f.writeInt(num);
    }
    int i = l;
    int j = r - 1;

    //for (int i = l; i < m; i++) b[i] = a[i];
    //for (int j = m; j < r; j++) b[j] = a[m + r - j - 1];
    //if (b[j] < b[i]) a[k] = b[j--];
    //else a[k] = b[i++];

    for (int k = l; k < r; k++){
        b1f.seek(j*4);
        int b_j = b1f.readInt();
        b1f.seek(i*4);
        int b_i = b1f.readInt();
        if(b_j<b_i){
            b1f.seek((j--)*4);
            a1f.seek(k*4);
            a1f.writeInt(b1f.readInt());
        }
        else{
            b1f.seek((i++)*4);
            a1f.seek(k*4);
            a1f.writeInt(b1f.readInt());
        }

    }

    a1f.close();
    b1f.close();
}

Solution

The most glaring problem in your code here, is that you open and close the file on each call to the merge function. You should instead open it once in the calling function, and pass in the already-open files to the merge method. The open and close operaions are both slow, and should be minimized.

I recommend that you re-work that part of the solution, then re-post as a 'follow on', but then also include the calling part of the function in to your question so that more of the overall structure of your solution can be reviewed.

The variables a, b, l, r, m i, j, and k are also all horrible names... you should rename them to be 'self-documenting'. They should be called what they are. The variable l is particularly problematic because it is so easy to confuse with 1.

I struggle to understand the difference between the a and b files. It appears that you are copying the first chunk of data from the a-file to the b-file, and then merging from two different places in the b file back to the a-file.

I would recommend instead having two read-poitns on the b file and then write back to the A. The extra copy is slowing you down too. There's no need to copy from a-to b if you can jsut swap them...

Context

StackExchange Code Review Q#66369, answer score: 4

Revisions (0)

No revisions yet.