snippetbashMinor

Bash script to convert NIST vectors to debug scripts

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

scriptnistvectorsscriptsconvertdebugbash

Problem

TL;DR: The Bash script converts a published, somewhat-structured text file into a format usable by my testing infrastructure. It's slow, I think it's ugly -- although it is fully functional.

The NIST provides test vectors for verifying the correct operation of a Galois Counter Mode (GCM) when used with the AES block cipher (I only care about the 128-bit key files, and have not looked into the format of the other files).

In order to actually use these test vectors for automated testing of my GCM-AES implementation, I have to convert them from the RSP file that they come in, into a debug script that my chip simulator (mspdebug) can use. There are also other scripts, as well as driver code, that the test vectors ultimately interact with -- but it is sufficient for this problem to have each variable in the RSP file set in memory with an mw command (e.g., "PT = 010203" should become "mw PT 0x01 0x02 0x03"), so long as each group of tests that share common values is broken into a separate file.

As an operational example, from this section of the gcmEncryptExtIV128.rsp input file, this output is generated. Note that the input file generates 525 such files, from 525 corresponding sections within itself. The script, as written, will not work on just the subsection linked above, as the complete RSP has some extra junk at the start that gets trimmed out (though you can probably get it to work with some fiddling). Note also that none of the encryption tests are marked with a "FAIL" -- this token occurs only within the decryption tests, but the contents of the two RSP files are otherwise identical. For the sake of uniformity, each encryption test (as well as the non-failing decryption tests) have an output line of mw FAIL 0.

The script is incredibly slow (it takes ~15 minutes to run against a single RSP file on a modest modern machine), primarily because of the sed expression for ensuring that every test block has a FAIL setting. It does, however, correctly spit out each t

Solution

Rewriting it in AWK would definitely result in a huge improvement, enough to say that writing it in Bash was a poor choice. Many of the considerations for this problem favour AWK:

The input is line-oriented.

Nearly every line has the same key = value format, except for the headers with [key = value] instead. Most importantly, they all share the same = delimiter.

All of the processing can be done using simple text transformations and arithmetic.

Processing can be done in one pass, with very little state to maintain.

I think that Bash is underpowered for this problem, and is therefore a poor fit. The repeated use of sed is not only a performance barrier; the constant intermingling of Bash and sed hurts readability.

Of course, any other general-purpose programming language would also work. However, considering that any system that has Bash will also have AWK, and AWK is just powerful enough to handle this problem comfortably, that's what I would choose. Besides, you already used a tiny bit of AWK within your Bash script — why not go all the way? ☺

The AWK program below is much faster than your Bash script, and in my opinion, more readable. That said, there are some minor improvements that could be made to the Bash-based solution. I may eventually return to review it.

#!/usr/bin/awk -f

BEGIN {
    FS = " = ";
    NUM_HEADERS = 0;
}

######################################################################
# Skip first 6 lines
######################################################################
FNR  0 {
    if (OUT) {
        end_of_stanza();
        close(OUT);
    }
    basename = FILENAME;
    sub("\\..*", "", basename);
    sub("[0-9]*$", "", basename);
    OUT = sprintf("%s%d-%d-%d-%d-%d.mspd",
                  basename,
                  HEADER_VALUE["Keylen"],
                  HEADER_VALUE["IVlen"],
                  HEADER_VALUE["PTlen"],
                  HEADER_VALUE["AADlen"],
                  HEADER_VALUE["Taglen"]);

    for (h = 0; h  OUT;
    }
    NUM_HEADERS = 0;
    FAIL = "";
    next;
}

######################################################################
# Split values of Key, IV, PT, AAD, CT, and Tag into hex bytes
######################################################################
$1 ~ /^(Key|IV|PT|AAD|CT|Tag)$/ && $2 ~ /^([0-9a-f][0-9a-f])+$/{
    split($2, a, "");
    $2 = "";
    for (i = 1; i  OUT;
        print "read gcm_test_round.mspd\n" > OUT;
    }
    FAIL = "0";
    print "" > OUT;
}

$1 == "FAIL" {
    FAIL = "1";
    next;
}
$1 == "Count" {
    end_of_stanza();
    next;
}
END {
    end_of_stanza();
    close(OUT);
}

######################################################################
# Normal body line
######################################################################
!/^$/ {
    if ($2 == "") {
        print "mw", $1 > OUT;
    } else {
        print "mw", $1, $2 > OUT;
    }
}

Code Snippets

#!/usr/bin/awk -f

BEGIN {
    FS = " = ";
    NUM_HEADERS = 0;
}

######################################################################
# Skip first 6 lines
######################################################################
FNR < 7 { next }

######################################################################
# dos2unix
######################################################################
{ sub("\r$", ""); }

######################################################################
# Read headers, of the form
# [Keylen = 96]
######################################################################
/\[.*\]/ {
    gsub("\\[|\\]", "");
    HEADER_NAME[NUM_HEADERS++] = $1;
    HEADER_VALUE[$1] = $2;
    next;
}

######################################################################
# End of headers.  Determine output file, and write out the headers.
# Output filename is of the form
# [BASEFILE][Keylen]-[IVlen]-[PTlen]-[AADlen]-[Taglen].mspd
######################################################################
NUM_HEADERS > 0 {
    if (OUT) {
        end_of_stanza();
        close(OUT);
    }
    basename = FILENAME;
    sub("\\..*", "", basename);
    sub("[0-9]*$", "", basename);
    OUT = sprintf("%s%d-%d-%d-%d-%d.mspd",
                  basename,
                  HEADER_VALUE["Keylen"],
                  HEADER_VALUE["IVlen"],
                  HEADER_VALUE["PTlen"],
                  HEADER_VALUE["AADlen"],
                  HEADER_VALUE["Taglen"]);

    for (h = 0; h < NUM_HEADERS; h++) {
        header_name = HEADER_NAME[h];
        hex_value = sprintf("%04x", HEADER_VALUE[header_name]);
        printf "mw %s 0x%s 0x%s\n", header_name, substr(hex_value, 3, 2), substr(hex_value, 1, 2) > OUT;
    }
    NUM_HEADERS = 0;
    FAIL = "";
    next;
}

######################################################################
# Split values of Key, IV, PT, AAD, CT, and Tag into hex bytes
######################################################################
$1 ~ /^(Key|IV|PT|AAD|CT|Tag)$/ && $2 ~ /^([0-9a-f][0-9a-f])+$/{
    split($2, a, "");
    $2 = "";
    for (i = 1; i < length(a); i += 2) {
        $2 = sprintf("%s 0x%s%s", $2, a[i], a[i + 1]);
    }
    $2 = substr($2, 2);
}

######################################################################
# Stanza processing: mark failure or non-failure
######################################################################
function end_of_stanza() {
    if (FAIL != "") {
        print "mw FAIL", FAIL > OUT;
        print "read gcm_test_round.mspd\n" > OUT;
    }
    FAIL = "0";
    print "" > OUT;
}

$1 == "FAIL" {
    FAIL = "1";
    next;
}
$1 == "Count" {
    end_of_stanza();
    next;
}
END {
    end_of_stanza();
    close(OUT);
}

######################################################################
# Normal body line
######################################################################
!/^$/ {
    if ($2 == "") {
        print "mw", $1 > OUT;
    } else {
        print "mw", $1, $2 > OUT;
    }
}

Context

StackExchange Code Review Q#47631, answer score: 2

Revisions (0)

No revisions yet.