HiveBrain v1.2.0
Get Started
← Back to all entries
debugjavaMinor

File-based fixed-record merge sort

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filemergerecordbasedfixedsort

Problem

A problem posted here to Code Review prompted me to experiment with a general-purpose merge-sort algorithm for any fixed-length record data.

External / File-based mergesort

I answered that question, which specifically deals with int values in a file, and proposed an answer that suggested a MappedByteBuffer.

These NIO concepts can be complicated, so I played with the code, and came up with what I believe to be a good solution for a more general case, where the format of the data can be less structured than a simple 4-byte int.

To abstract out the data format from the implementation the code exposes an interface, which allows the format to be exposed to the sort algorithm. The two items on the interface are:

  • recordLength()



  • compare(ByteBuffer, ByteBuffer)



The recordLength is used to manage the byte buffers, and the compare(...) method should parse a single record off each byte buffer (which will be correctly positioned already), and return an integer that follows the standard Java contract of negative, 0, or positive if the record on the first buffer is smaller, equals, or larger than the second record.

A 'simple' use case for this code, which assumes the data contains 4-byte int values, would be:

private static final class IntParser implements FixedRecordSortFile.RecordParser {

    @Override
    public int recordLength() {
        // int values are 4 bytes.
        return 4;
    }

    @Override
    public int compare(ByteBuffer bufferOne, ByteBuffer bufferTwo) {
        // parse the int from each buffer, and compare them.
        return Integer.compare(bufferOne.getInt(), bufferTwo.getInt());
    }

}


Using this, you can sort a file with:

FixedRecordSortFile.sort(Paths.get("path/to/data"), new IntParser());


The actual sort code is as follows:

```
package mergesort;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChann

Solution

In smallSort, you have some duplicated lines in both branches of the if-else, which you could extract and do before:

int ck = parser.compare(abuf.buffer, bbuf.buffer);
abuf.positionAtRecord(rec);
bbuf.positionAtRecord(rec + 1);
outbuf.positionAtRecord(outpos++);
if (ck <= 0) {
    outbuf.copyRecord(abuf.buffer);
    outbuf.positionAtRecord(outpos++);
    outbuf.copyRecord(bbuf.buffer);
} else {
    outbuf.copyRecord(bbuf.buffer);
    outbuf.positionAtRecord(outpos++);
    outbuf.copyRecord(abuf.buffer);
}


Then near the end of the method, the increment in outpos is pointless:

if (max < recordCount) {
    abuf.positionAtRecord(max);
    outbuf.positionAtRecord(outpos++);
    outbuf.copyRecord(abuf.buffer);
}


A for loop would be more natural for this piece:

long pos = 0;
while (pos < size) {
    locateWindow(pos);
    source.locateWindow(pos);
    buffer.put(source.buffer);
    pos += windowSize;
}


The initialization of tmp is pointless here, because you overwrite it anyway inside the loop:

FastFile tmp = null;
for (int bs = 2; bs < recordCount; bs <<= 1) {
    mergeSort(from, to, bs);
    tmp = from;
    from = to;
    to = tmp;
}


The call to super() is pointless in the FileBuffer class:

public FileBuffer(FileChannel channel, MapMode mode, long windowSize, long size,
                int recordLength, int recordsPerWindow) {
    super();


Pointless final modifiers

What's the point of the final modifier on a private static method like this one:

private static final int calculateWindow(final int approxSize, final int recordlength) {


Maybe I don't see something but it seems pointless.

Actually, even on public static methods, I find the purpose of final questionable. When a method is static, a sub-class already cannot override it. Sure, a sub-class could still shadow it, but is that a realistic concern? If a sub-class shadows a parent's static method, how would that be a problem? A static final method seems a bit excessively cautious.

Code Snippets

int ck = parser.compare(abuf.buffer, bbuf.buffer);
abuf.positionAtRecord(rec);
bbuf.positionAtRecord(rec + 1);
outbuf.positionAtRecord(outpos++);
if (ck <= 0) {
    outbuf.copyRecord(abuf.buffer);
    outbuf.positionAtRecord(outpos++);
    outbuf.copyRecord(bbuf.buffer);
} else {
    outbuf.copyRecord(bbuf.buffer);
    outbuf.positionAtRecord(outpos++);
    outbuf.copyRecord(abuf.buffer);
}
if (max < recordCount) {
    abuf.positionAtRecord(max);
    outbuf.positionAtRecord(outpos++);
    outbuf.copyRecord(abuf.buffer);
}
long pos = 0;
while (pos < size) {
    locateWindow(pos);
    source.locateWindow(pos);
    buffer.put(source.buffer);
    pos += windowSize;
}
FastFile tmp = null;
for (int bs = 2; bs < recordCount; bs <<= 1) {
    mergeSort(from, to, bs);
    tmp = from;
    from = to;
    to = tmp;
}
public FileBuffer(FileChannel channel, MapMode mode, long windowSize, long size,
                int recordLength, int recordsPerWindow) {
    super();

Context

StackExchange Code Review Q#61586, answer score: 3

Revisions (0)

No revisions yet.