HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Java hex dumper

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
hexjavadumper

Problem

Can somebody please try and help me speed up my code? The file is ~12MB (you can download it here). It takes around 500-600 milliseconds to run on my i7 4790k.

```
import java.io.FileInputStream;
import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayDeque;
import java.util.Deque;

/**
* Created by Jonathan on 4/1/2016.
*/
public class HexDumper {

public static void main(String[] args) throws Exception {
Deque lines = new ArrayDeque<>(1_000_000);
lines.add("Address 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F\n");

long s = System.currentTimeMillis();
FileChannel channel = new FileInputStream("client.dll").getChannel();
ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
byte[] bytes = new byte[16];
int offset = 0;
while (buffer.remaining() > 0) {
buffer.get(bytes);
lines.add(new DataRow(offset, bytes).toString());
offset += 16;
}

System.out.println(System.currentTimeMillis() - s);
Files.write(Paths.get("dump.txt"), lines);
}

public static class DataRow {

private final int offset;
private final byte[] values;

public DataRow(int offset, byte[] values) {
this.offset = offset;
this.values = values;
}

public int offset() {
return offset;
}

public byte[] values() {
return values;
}

private final static char[] HEX_ARRAY = "0123456789ABCDEF".toCharArray();

private static String bytesToHex(byte[] bytes, int bundleSize) {
char[] hexChars = new char[(bytes.length * 2) + (bytes.length / bundleSize)];
for (int j = 0, k = 1; j >> 4];
hexChars[idx + 1] = HEX_ARRAY[v & 0x0F];
if ((k % bundleSize) == 0) {

Solution

The slowness

For each 16 bytes, the program creates:

  • a new DataRow object



  • a BigInteger



  • a char[]



  • several String



Yeah I can see how that might be slow with large inputs...

Memory use

The program is not very streamlined. It reads all the input in memory, builds the output in the desired format, and finally print everything.
You could reduce the memory footprint dramatically by printing as you go. (Eliminate the Deque.)

Design

The toString method is not designed for advanced custom formatting.
It's designed to return a simple and fast representation, typically used in debugging. For the purpose of formatting the data in an object in a custom format, it's recommended to have a dedicated method.

Instead of a class containing a value and an offset that is used only once when formatting, it would be more natural to have a simple utility function that takes these values as parameters.

Suggested implementation

This function returns the same as new HexDumper.DataRow(...)), but a lot simpler:

private String hexdumpLine(int offset, byte[] bytes) {
    StringBuilder builder = new StringBuilder(8 + 3 * 16);
    builder.append(String.format("%08X", offset));
    for (byte b : bytes) {
        builder.append(' ').append(String.format("%X", b));
    }
    return builder.toString();
}


The simplicity comes from the "cheat", using String.format("%X", ...) for the hexadecimal representation, and for the 0-padding of the address.

Unfortunately, String.format is slow, and according to your measurements this solution is actually slower than your original.
So we're better off hand-crafting that part instead:

private String hexdumpLine(int offset, byte[] bytes) {
    StringBuilder builder = new StringBuilder(8 + 3 * 16);
    appendPaddedHexFormat(builder, offset);
    for (byte b : bytes) {
        builder.append(' ')
                .append(toHexDigit((b >> 4) & 0xF))
                .append(toHexDigit(b & 0xF));
    }
    return builder.toString();
}

private char toHexDigit(int value) {
    return HEX_ARRAY[value];
}

private void appendPaddedHexFormat(StringBuilder builder, int offset) {
    builder.setLength(8);
    int value = offset;
    for (int i = 7; i >= 0; --i) {
        builder.setCharAt(i, toHexDigit(value & 0xF));
        value >>= 4;
    }
}


This should be faster than the original, because it creates much fewer objects.
As a further optimization, you could reuse the same StringBuilder by calling setLength(0) after (or before) each use.

Code Snippets

private String hexdumpLine(int offset, byte[] bytes) {
    StringBuilder builder = new StringBuilder(8 + 3 * 16);
    builder.append(String.format("%08X", offset));
    for (byte b : bytes) {
        builder.append(' ').append(String.format("%X", b));
    }
    return builder.toString();
}
private String hexdumpLine(int offset, byte[] bytes) {
    StringBuilder builder = new StringBuilder(8 + 3 * 16);
    appendPaddedHexFormat(builder, offset);
    for (byte b : bytes) {
        builder.append(' ')
                .append(toHexDigit((b >> 4) & 0xF))
                .append(toHexDigit(b & 0xF));
    }
    return builder.toString();
}

private char toHexDigit(int value) {
    return HEX_ARRAY[value];
}

private void appendPaddedHexFormat(StringBuilder builder, int offset) {
    builder.setLength(8);
    int value = offset;
    for (int i = 7; i >= 0; --i) {
        builder.setCharAt(i, toHexDigit(value & 0xF));
        value >>= 4;
    }
}

Context

StackExchange Code Review Q#124523, answer score: 6

Revisions (0)

No revisions yet.