HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpModerate

Base-36 encoding of a byte array

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
arrayencodingbytebase

Problem

After posting a question about Alphanumeric Hash generation on StackOverflow, the most helpful answer was to change the conversion method from pulling 5-bit chunks of a binary hash value, to instead changing the number to base-36.

This is pretty straightforward; find the quotient and remainder of the number divided by 36, encode the remainder as the next most significant digit of the result, then repeat with the quotient.

Sounds great, except that if I want to start with an integer hash 128 bits or greater, I can't just use the divide operator. In that case, I have to use "long division":

Decimal:
  18 R 3
  __
5)93
 -5
  43
 -40
   3

Binary:
     00010010 R 11
     ________
0101)01011101
    -0101
     0000110
       -0101
        00011


So, to base-36-encode a large integer, stored as a byte array, I have the following method, which performs the basic iterative algorithm for binary long division, storing the result in another byte array and returning the modulus as an output parameter:

```
public static byte[] DivideBy(this byte[] bytes, ulong divisor, out ulong mod, bool preserveSize = true)
{
//the byte array MUST be little-endian here or the operation will be totally fubared.
var bitArray = new BitArray(bytes);

ulong buffer = 0;
byte quotientBuffer = 0;
byte qBufferLen = 0;
var quotient = new List();

//the bitarray indexes its values in little-endian fashion;
//as the index increases we move from LSB to MSB.
for (var i = bitArray.Count - 1; i >= 0; --i)
{
//The basic idea is similar to decimal long division;
//starting from the most significant bit, take enough bits
//to form a number divisible by (greater than) the divisor.
buffer = (buffer = divisor)
{
//Now divide; buffer will never be >= divisor * 2,
//so the quotient of buffer / divisor is always 1...
quotientBuffer = (byte)((quotientBuffer 0 || quotientBuffer > 0)

Solution

If performance means anything to you, use BigInteger.DivRem, as Guvante suggested (don't forget to add System.Numerics to your project references and using directives).

As an added bonus, BigInteger can handle both negative divisors and arbitrarily large divisors (tested it with a 128 bit long divisor, worked fine).

There are a few other improvements to be made to your code. You needn't define a char array for the alphabet, better just use a string constant - you can still access the characters by index. Also, avoid converting to IEnumerable from byte[] reversing and going back again. Instead, let the toConvert parameter be of type byte[] from the beginning and then, if needed, reverse it using Array.Reverse (it actually mutates the array, so you don't need to assign a new one).

Putting it all together:

Usage

byte[] bytes = Encoding.UTF8.GetBytes("A test 1234"); // I assume that's how you were converting 
string hash = bytes.ToBase36String();                 // from string to byte[] anyway..


Code

public static string ToBase36String(this byte[] toConvert, bool bigEndian = false)
{
    const string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
    if (bigEndian) Array.Reverse(toConvert); // !BitConverter.IsLittleEndian might be an alternative
    BigInteger dividend = new BigInteger(toConvert);
    var builder = new StringBuilder();
    while (dividend != 0)
    {
        BigInteger remainder;
        dividend = BigInteger.DivRem(dividend, 36, out remainder);
        builder.Insert(0, alphabet[Math.Abs(((int)remainder))]);
    }    
    return builder.ToString();
}


Performance

The difference is huge (running without the debugger, compiled in release configuration, timed with a System.Diagnostics.Stopwatch; i7@2.8GHz).

With eleven characters of input running one million times, BigInteger is 7.5 times faster:

The difference is amplified when feeding it longer strings, as shown in this example with an input of 34 chars at one hundred thousand iterations (13.5 times faster):

Code Snippets

byte[] bytes = Encoding.UTF8.GetBytes("A test 1234"); // I assume that's how you were converting 
string hash = bytes.ToBase36String();                 // from string to byte[] anyway..
public static string ToBase36String(this byte[] toConvert, bool bigEndian = false)
{
    const string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
    if (bigEndian) Array.Reverse(toConvert); // !BitConverter.IsLittleEndian might be an alternative
    BigInteger dividend = new BigInteger(toConvert);
    var builder = new StringBuilder();
    while (dividend != 0)
    {
        BigInteger remainder;
        dividend = BigInteger.DivRem(dividend, 36, out remainder);
        builder.Insert(0, alphabet[Math.Abs(((int)remainder))]);
    }    
    return builder.ToString();
}

Context

StackExchange Code Review Q#14084, answer score: 10

Revisions (0)

No revisions yet.