HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Base64 encoder/decoder optimizations

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
optimizationsdecoderbase64encoder

Problem

I've written a Base64 encoder/decoder, which works great. Now I want to see if I can get it working better. I've optimized as much as I can think of, but it may be missing some things. The encoder can encode a 160 MB file in 30 seconds, but the decoder takes nearly 60.

So far the optimizations I've done are:

  • Pre-allocated the file size using the formula on Wikipedia for encoding.



  • Pre-allocate the file size using the reciprocal of the encoding formula for decoding.



  • Use bitwise operations for byte and symbol manipulation.



  • Use a in-memory array for encoding.



One possible optimization that I don't know how to make better is the use of a std::map for decoding. O(log n) for searching and O(log n) for inserting for building the map (albeit only once).

Encoder:

```
#include "Base64Encoder.h"

#include
#include

const char Base64Encoder::EncodingTable[64] = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z', //0-25
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z', //26-51
'0','1','2','3','4','5','6','7','8','9', //52-61
'+','/'}; //62-63

const char Base64Encoder::PADDING_CHAR = '=';

Base64Encoder::Base64Encoder() { / DO NOTHING / }

Base64Encoder::~Base64Encoder() { / DO NOTHING / }

int Base64Encoder::GetFirstSymbolIndex(char* encoding_buffer) {
return ((encoding_buffer[0] & 0xFC) >> 2);
}

int Base64Encoder::GetSecondSymbolIndex(char* encoding_buffer) {
return (((encoding_buffer[0] & 0x03) > 4));
}

int Base64Encoder::GetThirdSymbolIndex(char* encoding_buffer) {
return (((encoding_buffer[1] & 0x0F) > 6));
}

int Base64Encoder::GetFour

Solution

I realise this is an old post but just came across it and couldn't help but notice the following pattern in the original post which was not addressed in the review:

encoding_buffer[1] & 0xF0) >> 4


Constructs like that are dangerous as the data type of the encoding_buffer is char instead of unsigned char and it's up to the compiler to decide whether to use arithmetic (repeat the leftmost bit on the left) or logical (fill 0s to the left) right shift. Far safer would be to rewrite the expression as:

(encoding_buffer[1] >> 4) & 0x0F


As a general rule it's better to do the shift first and apply the mask later.

The original code may or may not work on the designated platform but is certainly not portable.

Code Snippets

encoding_buffer[1] & 0xF0) >> 4
(encoding_buffer[1] >> 4) & 0x0F

Context

StackExchange Code Review Q#15780, answer score: 7

Revisions (0)

No revisions yet.