debugjavaMinor
File-based fixed-record merge sort
Viewed 0 times
filemergerecordbasedfixedsort
Problem
A problem posted here to Code Review prompted me to experiment with a general-purpose merge-sort algorithm for any fixed-length record data.
External / File-based mergesort
I answered that question, which specifically deals with int values in a file, and proposed an answer that suggested a MappedByteBuffer.
These NIO concepts can be complicated, so I played with the code, and came up with what I believe to be a good solution for a more general case, where the format of the data can be less structured than a simple 4-byte int.
To abstract out the data format from the implementation the code exposes an interface, which allows the format to be exposed to the sort algorithm. The two items on the interface are:
The
A 'simple' use case for this code, which assumes the data contains 4-byte int values, would be:
Using this, you can sort a file with:
The actual sort code is as follows:
```
package mergesort;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChann
External / File-based mergesort
I answered that question, which specifically deals with int values in a file, and proposed an answer that suggested a MappedByteBuffer.
These NIO concepts can be complicated, so I played with the code, and came up with what I believe to be a good solution for a more general case, where the format of the data can be less structured than a simple 4-byte int.
To abstract out the data format from the implementation the code exposes an interface, which allows the format to be exposed to the sort algorithm. The two items on the interface are:
recordLength()
compare(ByteBuffer, ByteBuffer)
The
recordLength is used to manage the byte buffers, and the compare(...) method should parse a single record off each byte buffer (which will be correctly positioned already), and return an integer that follows the standard Java contract of negative, 0, or positive if the record on the first buffer is smaller, equals, or larger than the second record.A 'simple' use case for this code, which assumes the data contains 4-byte int values, would be:
private static final class IntParser implements FixedRecordSortFile.RecordParser {
@Override
public int recordLength() {
// int values are 4 bytes.
return 4;
}
@Override
public int compare(ByteBuffer bufferOne, ByteBuffer bufferTwo) {
// parse the int from each buffer, and compare them.
return Integer.compare(bufferOne.getInt(), bufferTwo.getInt());
}
}Using this, you can sort a file with:
FixedRecordSortFile.sort(Paths.get("path/to/data"), new IntParser());The actual sort code is as follows:
```
package mergesort;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChann
Solution
In
Then near the end of the method, the increment in
A
The initialization of
The call to
Pointless
What's the point of the
Maybe I don't see something but it seems pointless.
Actually, even on
smallSort, you have some duplicated lines in both branches of the if-else, which you could extract and do before:int ck = parser.compare(abuf.buffer, bbuf.buffer);
abuf.positionAtRecord(rec);
bbuf.positionAtRecord(rec + 1);
outbuf.positionAtRecord(outpos++);
if (ck <= 0) {
outbuf.copyRecord(abuf.buffer);
outbuf.positionAtRecord(outpos++);
outbuf.copyRecord(bbuf.buffer);
} else {
outbuf.copyRecord(bbuf.buffer);
outbuf.positionAtRecord(outpos++);
outbuf.copyRecord(abuf.buffer);
}Then near the end of the method, the increment in
outpos is pointless:if (max < recordCount) {
abuf.positionAtRecord(max);
outbuf.positionAtRecord(outpos++);
outbuf.copyRecord(abuf.buffer);
}A
for loop would be more natural for this piece:long pos = 0;
while (pos < size) {
locateWindow(pos);
source.locateWindow(pos);
buffer.put(source.buffer);
pos += windowSize;
}The initialization of
tmp is pointless here, because you overwrite it anyway inside the loop:FastFile tmp = null;
for (int bs = 2; bs < recordCount; bs <<= 1) {
mergeSort(from, to, bs);
tmp = from;
from = to;
to = tmp;
}The call to
super() is pointless in the FileBuffer class:public FileBuffer(FileChannel channel, MapMode mode, long windowSize, long size,
int recordLength, int recordsPerWindow) {
super();Pointless
final modifiersWhat's the point of the
final modifier on a private static method like this one:private static final int calculateWindow(final int approxSize, final int recordlength) {Maybe I don't see something but it seems pointless.
Actually, even on
public static methods, I find the purpose of final questionable. When a method is static, a sub-class already cannot override it. Sure, a sub-class could still shadow it, but is that a realistic concern? If a sub-class shadows a parent's static method, how would that be a problem? A static final method seems a bit excessively cautious.Code Snippets
int ck = parser.compare(abuf.buffer, bbuf.buffer);
abuf.positionAtRecord(rec);
bbuf.positionAtRecord(rec + 1);
outbuf.positionAtRecord(outpos++);
if (ck <= 0) {
outbuf.copyRecord(abuf.buffer);
outbuf.positionAtRecord(outpos++);
outbuf.copyRecord(bbuf.buffer);
} else {
outbuf.copyRecord(bbuf.buffer);
outbuf.positionAtRecord(outpos++);
outbuf.copyRecord(abuf.buffer);
}if (max < recordCount) {
abuf.positionAtRecord(max);
outbuf.positionAtRecord(outpos++);
outbuf.copyRecord(abuf.buffer);
}long pos = 0;
while (pos < size) {
locateWindow(pos);
source.locateWindow(pos);
buffer.put(source.buffer);
pos += windowSize;
}FastFile tmp = null;
for (int bs = 2; bs < recordCount; bs <<= 1) {
mergeSort(from, to, bs);
tmp = from;
from = to;
to = tmp;
}public FileBuffer(FileChannel channel, MapMode mode, long windowSize, long size,
int recordLength, int recordsPerWindow) {
super();Context
StackExchange Code Review Q#61586, answer score: 3
Revisions (0)
No revisions yet.