patterncMinor
Speed optimisation and general tips for base64 encoding/decoding functions in C
Viewed 0 times
functionstipsoptimisationgeneralforbase64encodinganddecodingspeed
Problem
So far, the below code appears to work well. It operates pretty fast, but I was wondering if it's possible to make it faster. I'm also looking for general tips on what I might be doing wrong and what I could do better. I have added in comments to explain certain decisions.
b64.h
```
#include
#include
#include
#include
static const char encode_table[] = \
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
char ascii_to_base64(char asciistr)
{
int asciistrlen = strlen(asciistr);
int convertedlen = 4 * ceil((double)asciistrlen / 3.0);
char asciibuff = (char)malloc(convertedlen + 1);
if (asciibuff == NULL)
goto exception;
char *converted = malloc(convertedlen + 1);
if (converted == NULL)
goto exception;
/*
* By zero-ing out, I can just check 0x00 bytes and convert to padding
* without any extra checks.
*/
memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);
/*
* Can't realloc the asciistr var so have to create individual buffer
* with same values.
*/
strcpy(asciibuff, asciistr);
/*
* Allows me to loop through original string input AND be able to check
* for null bytes and pad accordingly without running into overwriting
* problems caused by using the "converted" variable for both jobs.
*/
int i, j;
for (i = 0, j = 0; i > 6) : 64];
converted[i+1] = encode_table[asciibuff[j] != 0x00 ?
((asciistr[j] & 0x03) > 4) : 64];
converted[i] = encode_table[asciibuff[j] != 0x00 ? asciibuff[j] >> 2 : 64];
}
free(asciibuff);
return (converted);
exception:
return (NULL);
}
/ Derive encoding index /
static unsigned int enci(int base64ascii)
{
if (base64ascii = 0x41)
return ((unsigned)(base64ascii - 0x41));
else if (base64ascii = 0x61)
return ((unsigned)(base64ascii - 0x47));
else if (base64ascii = 0x30)
return
b64.h
```
#include
#include
#include
#include
static const char encode_table[] = \
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
char ascii_to_base64(char asciistr)
{
int asciistrlen = strlen(asciistr);
int convertedlen = 4 * ceil((double)asciistrlen / 3.0);
char asciibuff = (char)malloc(convertedlen + 1);
if (asciibuff == NULL)
goto exception;
char *converted = malloc(convertedlen + 1);
if (converted == NULL)
goto exception;
/*
* By zero-ing out, I can just check 0x00 bytes and convert to padding
* without any extra checks.
*/
memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);
/*
* Can't realloc the asciistr var so have to create individual buffer
* with same values.
*/
strcpy(asciibuff, asciistr);
/*
* Allows me to loop through original string input AND be able to check
* for null bytes and pad accordingly without running into overwriting
* problems caused by using the "converted" variable for both jobs.
*/
int i, j;
for (i = 0, j = 0; i > 6) : 64];
converted[i+1] = encode_table[asciibuff[j] != 0x00 ?
((asciistr[j] & 0x03) > 4) : 64];
converted[i] = encode_table[asciibuff[j] != 0x00 ? asciibuff[j] >> 2 : 64];
}
free(asciibuff);
return (converted);
exception:
return (NULL);
}
/ Derive encoding index /
static unsigned int enci(int base64ascii)
{
if (base64ascii = 0x41)
return ((unsigned)(base64ascii - 0x41));
else if (base64ascii = 0x61)
return ((unsigned)(base64ascii - 0x47));
else if (base64ascii = 0x30)
return
Solution
I only really looked at
Firstly, you say that the code is "pretty fast", but why is that a
consideration? Don't assume that execution speed is necessarily important.
Readability and correctness are more so.
On correctness, your code does not handle padding correctly. Given the string
"1" your code produces "MQ=K" where it should produce "MQ==" (with "11" it
correctly produces "MTE="). This seems to come from using the wrong length in
the first memset:
If you use
use
Another point is that you use
already know its length;
Note that it is possible to encode the input string without copying it into a bigger buffer - you just have to be careful not to access beyond the end of the input string.
On readability, I have a little trouble with the function. The variable names
are too long for my liking (they are local to a short function) and the four
expressions in the for-loop are difficult to read. I dislike your use of the
ternary operator (?:) in these expressions as it makes them hard to
decode. And you mix access to
former).
Your bit-twiddling is okay, with good use of bracketing, although the 0xbf
should be 0x3f in the 1st line of the for-loop. I would prefer to see you
combining the values with '|' not '+'; both will work but when combining
non-numeric values '|' is more logical.
I prefer not to use
sure not to leak
Finally, your use of floating point to round-up when computing
is unnecessary. Instead of your:
I would expect something like:
BTW, your code has all sorts of singed/unsigned and loss of precision conversion warnings (gcc option -Wconversion).
EDIT : Variable naming is difficult. It is also open to argument, so I can only give
you my preferences which are based on the widely found advice that the length
of names should related to their scope. So variables in a small function can
be small; those in a large function can be longer (but large functions should
generally be avoided anyway). Global variables need to be larger still. But
these are guidlelines, not fixed rules. If
your application then using
all call
In the case of a smallish function like
input string
repeat that. For the copy of
The converted value I'd call
used just once and needs no variable; the length of the conversion I'd call
just
example is used only in this function and so should be local to the function,
not global, I'd call
Really small names, such as your loop variables
fact they are preferable for small loops), but when mixed together can be
difficult to tell apart. I have this problem with your for-loop, as it is not
easy to tell at a glance that
line. You might use
could refactor the loop to avoid having two indices.
(but note that I wouldn
shorter versions are easier to read.
ascii_to_base64.Firstly, you say that the code is "pretty fast", but why is that a
consideration? Don't assume that execution speed is necessarily important.
Readability and correctness are more so.
On correctness, your code does not handle padding correctly. Given the string
"1" your code produces "MQ=K" where it should produce "MQ==" (with "11" it
correctly produces "MTE="). This seems to come from using the wrong length in
the first memset:
memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);If you use
convertedlen for both, the problem is resolved. Better still,use
calloc instead of malloc followed by memset.Another point is that you use
strcpy to copy the input string, but youalready know its length;
memcpy is more efficient when the length is known.Note that it is possible to encode the input string without copying it into a bigger buffer - you just have to be careful not to access beyond the end of the input string.
On readability, I have a little trouble with the function. The variable names
are too long for my liking (they are local to a short function) and the four
expressions in the for-loop are difficult to read. I dislike your use of the
ternary operator (?:) in these expressions as it makes them hard to
decode. And you mix access to
asciibuff and asciistr (all should be to theformer).
Your bit-twiddling is okay, with good use of bracketing, although the 0xbf
should be 0x3f in the 1st line of the for-loop. I would prefer to see you
combining the values with '|' not '+'; both will work but when combining
non-numeric values '|' is more logical.
I prefer not to use
goto, as you have. And if I were to use it I'd makesure not to leak
asciibuff if allocation of converted failed. Finally, your use of floating point to round-up when computing
convertedlenis unnecessary. Instead of your:
int convertedlen = 4 * ceil((double)asciistrlen / 3.0);I would expect something like:
int convertedlen = 4 * ((asciistrlen + 2) / 3);BTW, your code has all sorts of singed/unsigned and loss of precision conversion warnings (gcc option -Wconversion).
EDIT : Variable naming is difficult. It is also open to argument, so I can only give
you my preferences which are based on the widely found advice that the length
of names should related to their scope. So variables in a small function can
be small; those in a large function can be longer (but large functions should
generally be avoided anyway). Global variables need to be larger still. But
these are guidlelines, not fixed rules. If
t1 has global significance toyour application then using
t1 as a global might be sensible; you wouldn't afterall call
pi anything but pi.In the case of a smallish function like
ascii_to_base64, I would call theinput string
s - the function name says it is an ASCII string so no need torepeat that. For the copy of
s, which is used everywhere, I'd use ascii.The converted value I'd call
base64. For the lengths: the length of s isused just once and needs no variable; the length of the conversion I'd call
just
len or maybe, len64. And the base-64 coding table, which in yourexample is used only in this function and so should be local to the function,
not global, I'd call
coding.Really small names, such as your loop variables
i and j are also okay (infact they are preferable for small loops), but when mixed together can be
difficult to tell apart. I have this problem with your for-loop, as it is not
easy to tell at a glance that
i is used only in the left-hand side of eachline. You might use
i and k, which are more easily distinguished. Or youcould refactor the loop to avoid having two indices.
const char *a = ascii;
for (int i = 0; i > 6) : 64];
base64[i+1] = coding[*a != 0 ? ((*a & 3) > 4) : 64];
base64[i] = coding[*a != 0 ? *a >> 2 : 64];
}(but note that I wouldn
t have coded the loop that way in the first place).
Note also that 0 is just as good as 0x00 and 3 is as good as 0x03`; theshorter versions are easier to read.
Code Snippets
memset(asciibuff, 0x00, asciistrlen + 1);
memset(converted, 0x00, convertedlen + 1);int convertedlen = 4 * ceil((double)asciistrlen / 3.0);int convertedlen = 4 * ((asciistrlen + 2) / 3);const char *a = ascii;
for (int i = 0; i < len; i += 4, a += 3)
{
base64[i+3] = coding[a[2] != 0 ? a[2] & 63 : 64];
base64[i+2] = coding[a[1] != 0 ? ((a[1] & 15) << 2) | (a[2] >> 6) : 64];
base64[i+1] = coding[*a != 0 ? ((*a & 3) << 4) | (a[1] >> 4) : 64];
base64[i] = coding[*a != 0 ? *a >> 2 : 64];
}Context
StackExchange Code Review Q#26826, answer score: 5
Revisions (0)
No revisions yet.