patterncsharpModerate
Base-36 encoding of a byte array
Viewed 0 times
arrayencodingbytebase
Problem
After posting a question about Alphanumeric Hash generation on StackOverflow, the most helpful answer was to change the conversion method from pulling 5-bit chunks of a binary hash value, to instead changing the number to base-36.
This is pretty straightforward; find the quotient and remainder of the number divided by 36, encode the remainder as the next most significant digit of the result, then repeat with the quotient.
Sounds great, except that if I want to start with an integer hash 128 bits or greater, I can't just use the divide operator. In that case, I have to use "long division":
So, to base-36-encode a large integer, stored as a byte array, I have the following method, which performs the basic iterative algorithm for binary long division, storing the result in another byte array and returning the modulus as an output parameter:
```
public static byte[] DivideBy(this byte[] bytes, ulong divisor, out ulong mod, bool preserveSize = true)
{
//the byte array MUST be little-endian here or the operation will be totally fubared.
var bitArray = new BitArray(bytes);
ulong buffer = 0;
byte quotientBuffer = 0;
byte qBufferLen = 0;
var quotient = new List();
//the bitarray indexes its values in little-endian fashion;
//as the index increases we move from LSB to MSB.
for (var i = bitArray.Count - 1; i >= 0; --i)
{
//The basic idea is similar to decimal long division;
//starting from the most significant bit, take enough bits
//to form a number divisible by (greater than) the divisor.
buffer = (buffer = divisor)
{
//Now divide; buffer will never be >= divisor * 2,
//so the quotient of buffer / divisor is always 1...
quotientBuffer = (byte)((quotientBuffer 0 || quotientBuffer > 0)
This is pretty straightforward; find the quotient and remainder of the number divided by 36, encode the remainder as the next most significant digit of the result, then repeat with the quotient.
Sounds great, except that if I want to start with an integer hash 128 bits or greater, I can't just use the divide operator. In that case, I have to use "long division":
Decimal:
18 R 3
__
5)93
-5
43
-40
3
Binary:
00010010 R 11
________
0101)01011101
-0101
0000110
-0101
00011So, to base-36-encode a large integer, stored as a byte array, I have the following method, which performs the basic iterative algorithm for binary long division, storing the result in another byte array and returning the modulus as an output parameter:
```
public static byte[] DivideBy(this byte[] bytes, ulong divisor, out ulong mod, bool preserveSize = true)
{
//the byte array MUST be little-endian here or the operation will be totally fubared.
var bitArray = new BitArray(bytes);
ulong buffer = 0;
byte quotientBuffer = 0;
byte qBufferLen = 0;
var quotient = new List();
//the bitarray indexes its values in little-endian fashion;
//as the index increases we move from LSB to MSB.
for (var i = bitArray.Count - 1; i >= 0; --i)
{
//The basic idea is similar to decimal long division;
//starting from the most significant bit, take enough bits
//to form a number divisible by (greater than) the divisor.
buffer = (buffer = divisor)
{
//Now divide; buffer will never be >= divisor * 2,
//so the quotient of buffer / divisor is always 1...
quotientBuffer = (byte)((quotientBuffer 0 || quotientBuffer > 0)
Solution
If performance means anything to you, use
As an added bonus,
There are a few other improvements to be made to your code. You needn't define a char array for the alphabet, better just use a string constant - you can still access the characters by index. Also, avoid converting to
Putting it all together:
Usage
Code
Performance
The difference is huge (running without the debugger, compiled in release configuration, timed with a
With eleven characters of input running one million times,
The difference is amplified when feeding it longer strings, as shown in this example with an input of 34 chars at one hundred thousand iterations (13.5 times faster):
BigInteger.DivRem, as Guvante suggested (don't forget to add System.Numerics to your project references and using directives). As an added bonus,
BigInteger can handle both negative divisors and arbitrarily large divisors (tested it with a 128 bit long divisor, worked fine).There are a few other improvements to be made to your code. You needn't define a char array for the alphabet, better just use a string constant - you can still access the characters by index. Also, avoid converting to
IEnumerable from byte[] reversing and going back again. Instead, let the toConvert parameter be of type byte[] from the beginning and then, if needed, reverse it using Array.Reverse (it actually mutates the array, so you don't need to assign a new one).Putting it all together:
Usage
byte[] bytes = Encoding.UTF8.GetBytes("A test 1234"); // I assume that's how you were converting
string hash = bytes.ToBase36String(); // from string to byte[] anyway..Code
public static string ToBase36String(this byte[] toConvert, bool bigEndian = false)
{
const string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
if (bigEndian) Array.Reverse(toConvert); // !BitConverter.IsLittleEndian might be an alternative
BigInteger dividend = new BigInteger(toConvert);
var builder = new StringBuilder();
while (dividend != 0)
{
BigInteger remainder;
dividend = BigInteger.DivRem(dividend, 36, out remainder);
builder.Insert(0, alphabet[Math.Abs(((int)remainder))]);
}
return builder.ToString();
}Performance
The difference is huge (running without the debugger, compiled in release configuration, timed with a
System.Diagnostics.Stopwatch; i7@2.8GHz).With eleven characters of input running one million times,
BigInteger is 7.5 times faster:The difference is amplified when feeding it longer strings, as shown in this example with an input of 34 chars at one hundred thousand iterations (13.5 times faster):
Code Snippets
byte[] bytes = Encoding.UTF8.GetBytes("A test 1234"); // I assume that's how you were converting
string hash = bytes.ToBase36String(); // from string to byte[] anyway..public static string ToBase36String(this byte[] toConvert, bool bigEndian = false)
{
const string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz";
if (bigEndian) Array.Reverse(toConvert); // !BitConverter.IsLittleEndian might be an alternative
BigInteger dividend = new BigInteger(toConvert);
var builder = new StringBuilder();
while (dividend != 0)
{
BigInteger remainder;
dividend = BigInteger.DivRem(dividend, 36, out remainder);
builder.Insert(0, alphabet[Math.Abs(((int)remainder))]);
}
return builder.ToString();
}Context
StackExchange Code Review Q#14084, answer score: 10
Revisions (0)
No revisions yet.