HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Speed optimisation and general tips for base64 encoding/decoding functions in C

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functionstipsoptimisationgeneralforbase64encodinganddecodingspeed

Problem

So far, the below code appears to work well. It operates pretty fast, but I was wondering if it's possible to make it faster. I'm also looking for general tips on what I might be doing wrong and what I could do better. I have added in comments to explain certain decisions.

b64.h

```
#include
#include
#include
#include

static const char encode_table[] = \
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";

char ascii_to_base64(char asciistr)
{
int asciistrlen = strlen(asciistr);
int convertedlen = 4 * ceil((double)asciistrlen / 3.0);
char asciibuff = (char)malloc(convertedlen + 1);
if (asciibuff == NULL)
goto exception;
char *converted = malloc(convertedlen + 1);
if (converted == NULL)
goto exception;

/*
* By zero-ing out, I can just check 0x00 bytes and convert to padding
* without any extra checks.
*/
memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);
/*
* Can't realloc the asciistr var so have to create individual buffer
* with same values.
*/
strcpy(asciibuff, asciistr);
/*
* Allows me to loop through original string input AND be able to check
* for null bytes and pad accordingly without running into overwriting
* problems caused by using the "converted" variable for both jobs.
*/

int i, j;
for (i = 0, j = 0; i > 6) : 64];
converted[i+1] = encode_table[asciibuff[j] != 0x00 ?
((asciistr[j] & 0x03) > 4) : 64];
converted[i] = encode_table[asciibuff[j] != 0x00 ? asciibuff[j] >> 2 : 64];
}

free(asciibuff);
return (converted);
exception:
return (NULL);
}

/ Derive encoding index /
static unsigned int enci(int base64ascii)
{
if (base64ascii = 0x41)
return ((unsigned)(base64ascii - 0x41));
else if (base64ascii = 0x61)
return ((unsigned)(base64ascii - 0x47));
else if (base64ascii = 0x30)
return

Solution

I only really looked at ascii_to_base64.

Firstly, you say that the code is "pretty fast", but why is that a
consideration? Don't assume that execution speed is necessarily important.
Readability and correctness are more so.

On correctness, your code does not handle padding correctly. Given the string
"1" your code produces "MQ=K" where it should produce "MQ==" (with "11" it
correctly produces "MTE="). This seems to come from using the wrong length in
the first memset:

memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);


If you use convertedlen for both, the problem is resolved. Better still,
use calloc instead of malloc followed by memset.

Another point is that you use strcpy to copy the input string, but you
already know its length; memcpy is more efficient when the length is known.

Note that it is possible to encode the input string without copying it into a bigger buffer - you just have to be careful not to access beyond the end of the input string.

On readability, I have a little trouble with the function. The variable names
are too long for my liking (they are local to a short function) and the four
expressions in the for-loop are difficult to read. I dislike your use of the
ternary operator (?:) in these expressions as it makes them hard to
decode. And you mix access to asciibuff and asciistr (all should be to the
former).

Your bit-twiddling is okay, with good use of bracketing, although the 0xbf
should be 0x3f in the 1st line of the for-loop. I would prefer to see you
combining the values with '|' not '+'; both will work but when combining
non-numeric values '|' is more logical.

I prefer not to use goto, as you have. And if I were to use it I'd make
sure not to leak asciibuff if allocation of converted failed.

Finally, your use of floating point to round-up when computing convertedlen
is unnecessary. Instead of your:

int convertedlen = 4 * ceil((double)asciistrlen / 3.0);


I would expect something like:

int convertedlen = 4 * ((asciistrlen + 2) / 3);


BTW, your code has all sorts of singed/unsigned and loss of precision conversion warnings (gcc option -Wconversion).

EDIT : Variable naming is difficult. It is also open to argument, so I can only give
you my preferences which are based on the widely found advice that the length
of names should related to their scope. So variables in a small function can
be small; those in a large function can be longer (but large functions should
generally be avoided anyway). Global variables need to be larger still. But
these are guidlelines, not fixed rules. If t1 has global significance to
your application then using t1 as a global might be sensible; you wouldn't after
all call pi anything but pi.

In the case of a smallish function like ascii_to_base64, I would call the
input string s - the function name says it is an ASCII string so no need to
repeat that. For the copy of s, which is used everywhere, I'd use ascii.
The converted value I'd call base64. For the lengths: the length of s is
used just once and needs no variable; the length of the conversion I'd call
just len or maybe, len64. And the base-64 coding table, which in your
example is used only in this function and so should be local to the function,
not global, I'd call coding.

Really small names, such as your loop variables i and j are also okay (in
fact they are preferable for small loops), but when mixed together can be
difficult to tell apart. I have this problem with your for-loop, as it is not
easy to tell at a glance that i is used only in the left-hand side of each
line. You might use i and k, which are more easily distinguished. Or you
could refactor the loop to avoid having two indices.

const char *a = ascii;
for (int i = 0; i > 6) : 64];
    base64[i+1] = coding[*a   != 0 ? ((*a   & 3)  > 4) : 64];
    base64[i]   = coding[*a   != 0 ?   *a >> 2 : 64];
}


(but note that I wouldnt have coded the loop that way in the first place).

Note also that
0 is just as good as 0x00 and 3 is as good as 0x03`; the
shorter versions are easier to read.

Code Snippets

memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);
int convertedlen = 4 * ceil((double)asciistrlen / 3.0);
int convertedlen = 4 * ((asciistrlen + 2) / 3);
const char *a = ascii;
for (int i = 0; i < len; i += 4, a += 3)
{
    base64[i+3] = coding[a[2] != 0 ?   a[2] & 63 : 64];
    base64[i+2] = coding[a[1] != 0 ? ((a[1] & 15) << 2) | (a[2] >> 6) : 64];
    base64[i+1] = coding[*a   != 0 ? ((*a   & 3)  << 4) | (a[1] >> 4) : 64];
    base64[i]   = coding[*a   != 0 ?   *a >> 2 : 64];
}

Context

StackExchange Code Review Q#26826, answer score: 5

Revisions (0)

No revisions yet.