patterncppMinor
Converting hex to integer
Viewed 0 times
convertinghexinteger
Problem
What's the fastest way of doing this?
int hextoint(char number) {
if (number == '0') {
return 0;
}
if (number == '1') {
return 1;
}
if (number == '2') {
return 2;
}
/*
* 3 through 8
*/
if (number == '9') {
return 9;
}
if (number == 'a') {
return 10;
}
if (number == 'b') {
return 11;
}
if (number == 'c') {
return 12;
}
if (number == 'd') {
return 13;
}
if (number == 'e') {
return 14;
}
if (number == 'f') {
return 15;
}
return -1;
}Solution
It seems that RippeR isn't going to write a full answer, so here's my take (especially as my comments to another answer got deleted together with the little jewel that proposed
First of all, the performance charactistics of a small function like hex digit conversion depend heavily on the context where it is used, and so it's impossible to say what's fastest until a concrete usage context has been specified.
The problem of hex digit conversion is a bit special in that it has a particular solution - the lookup table - which has excellent performance characteristics across a wide range of usages, and the smallest possible code footprint at the same time. It does, however, impose a bit of overhead for the initialisation of the lookup table, either in the form of static initialisation or initialisation with code.
Anyway: regardless of how the lookup table gets initialised, it will usually be used via a litte function wrapper (or member function, if wrapped in a class):
The cast to
Modern compilers will usually inline the code even if it resides in a .cpp somewhere, so there's no need to get cute. In an actual implementation the lookup table could be integrated with the character classification table for the scanner/lexer, but here I filled all non-digit slots with
It takes more space than static initialisation but it's more flexible, which is good for experimentation. It also makes the code independent from the execution character set in the sense that it's not necessary to know up front which character maps to which code point.
To get some timings I pitted the function against three other versions. To make the test realistic, I adapted a function from a high-performance scanner:
Besides allowing tests with a bunch of 8-digit hex strings, it can also be called with huge strings to get a different take on performance characteristics. Excess digits are lost because they get shifted out but
The first contender is the OP's own solution:
The second contender looks almost the same but it uses a switch statement instead of conditionals. If you think that it has little merit then you may be in for a little surprise when you look at the timings:
Last but not least, a reasonably straightforward solution where I lent helping hand to the compiler regarding the math (something we used to do a lot with older compilers):
Note, however, that this depends on three character ranges being contiguous and hence might not work with execution character sets that aren't based on ASCII and its descendants. The other three solutions are completely immune to such problems.
The test frame fills a big string with random hex digits (including upper case
std::unordered_map).First of all, the performance charactistics of a small function like hex digit conversion depend heavily on the context where it is used, and so it's impossible to say what's fastest until a concrete usage context has been specified.
The problem of hex digit conversion is a bit special in that it has a particular solution - the lookup table - which has excellent performance characteristics across a wide range of usages, and the smallest possible code footprint at the same time. It does, however, impose a bit of overhead for the initialisation of the lookup table, either in the form of static initialisation or initialisation with code.
Anyway: regardless of how the lookup table gets initialised, it will usually be used via a litte function wrapper (or member function, if wrapped in a class):
enum { NOT_A_DIGIT = (1 << CHAR_BIT) - 1 };
typedef unsigned char byte;
byte DigitValue[1 << CHAR_BIT];
// ...
unsigned hex_digit_value_v3 (char c)
{
return DigitValue[byte(c)];
}The cast to
byte (i.e. unsigned char) is necessary because char can be - and often is - signed.Modern compilers will usually inline the code even if it resides in a .cpp somewhere, so there's no need to get cute. In an actual implementation the lookup table could be integrated with the character classification table for the scanner/lexer, but here I filled all non-digit slots with
NOT_A_DIGIT. For completeness' sake, here's the initialisation code I used:void set_lookup_range_ (byte base_value, char lo, char hi)
{
for (byte c = byte(lo); ; ++c)
{
DigitValue[c] = byte(base_value + (c - byte(lo))); // cast needed since ari widens to int
if (c >= byte(hi))
return;
}
}
void initialise_lookup_table ()
{
std::memset(DigitValue, NOT_A_DIGIT, sizeof(DigitValue));
set_lookup_range_( 0, '0', '9');
set_lookup_range_(10, 'A', 'F');
set_lookup_range_(10, 'a', 'f');
}It takes more space than static initialisation but it's more flexible, which is good for experimentation. It also makes the code independent from the execution character set in the sense that it's not necessary to know up front which character maps to which code point.
To get some timings I pitted the function against three other versions. To make the test realistic, I adapted a function from a high-performance scanner:
// to be called with read_ptr pointing at a hex digit; digits that don't fit into an unsigned get
// dropped quietly (i.e. they shifted out at the upper end)
template
DECLSPEC_NOINLINE
unsigned extract_hex_unsigned (char const **read_ptr)
{
char const *p = *read_ptr;
unsigned value = hex_digit_value(*p);
unsigned digit;
while ((digit = hex_digit_value(*++p)) < 16)
value = (value << 4) + digit;
*read_ptr = p;
return value;
}Besides allowing tests with a bunch of 8-digit hex strings, it can also be called with huge strings to get a different take on performance characteristics. Excess digits are lost because they get shifted out but
hex_digit_value() gets called for each of them and that's what matters for the timings.The first contender is the OP's own solution:
unsigned hex_digit_value_v0 (char c)
{
if (c == '0') return 0;
if (c == '1') return 1;
// ... 18 more not shown ...
if (c == 'e') return 14;
if (c == 'f') return 15;
return NOT_A_DIGIT;
}The second contender looks almost the same but it uses a switch statement instead of conditionals. If you think that it has little merit then you may be in for a little surprise when you look at the timings:
unsigned hex_digit_value_v1 (char c)
{
switch (c)
{
case '0': return 0;
case '1': return 1;
// ... 18 more not shown ...
case 'e': return 14;
case 'f': return 15;
}
return NOT_A_DIGIT;
}Last but not least, a reasonably straightforward solution where I lent helping hand to the compiler regarding the math (something we used to do a lot with older compilers):
unsigned hex_digit_value_v2 (char c)
{
switch (c)
{
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
return byte(c - '0');
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
return byte(c - 'A' + 10);
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
return byte(c - 'a' + 10);
}
return NOT_A_DIGIT;
}Note, however, that this depends on three character ranges being contiguous and hence might not work with execution character sets that aren't based on ASCII and its descendants. The other three solutions are completely immune to such problems.
The test frame fills a big string with random hex digits (including upper case
Code Snippets
enum { NOT_A_DIGIT = (1 << CHAR_BIT) - 1 };
typedef unsigned char byte;
byte DigitValue[1 << CHAR_BIT];
// ...
unsigned hex_digit_value_v3 (char c)
{
return DigitValue[byte(c)];
}void set_lookup_range_ (byte base_value, char lo, char hi)
{
for (byte c = byte(lo); ; ++c)
{
DigitValue[c] = byte(base_value + (c - byte(lo))); // cast needed since ari widens to int
if (c >= byte(hi))
return;
}
}
void initialise_lookup_table ()
{
std::memset(DigitValue, NOT_A_DIGIT, sizeof(DigitValue));
set_lookup_range_( 0, '0', '9');
set_lookup_range_(10, 'A', 'F');
set_lookup_range_(10, 'a', 'f');
}// to be called with read_ptr pointing at a hex digit; digits that don't fit into an unsigned get
// dropped quietly (i.e. they shifted out at the upper end)
template<unsigned hex_digit_value (char c)>
DECLSPEC_NOINLINE
unsigned extract_hex_unsigned (char const **read_ptr)
{
char const *p = *read_ptr;
unsigned value = hex_digit_value(*p);
unsigned digit;
while ((digit = hex_digit_value(*++p)) < 16)
value = (value << 4) + digit;
*read_ptr = p;
return value;
}unsigned hex_digit_value_v0 (char c)
{
if (c == '0') return 0;
if (c == '1') return 1;
// ... 18 more not shown ...
if (c == 'e') return 14;
if (c == 'f') return 15;
return NOT_A_DIGIT;
}unsigned hex_digit_value_v1 (char c)
{
switch (c)
{
case '0': return 0;
case '1': return 1;
// ... 18 more not shown ...
case 'e': return 14;
case 'f': return 15;
}
return NOT_A_DIGIT;
}Context
StackExchange Code Review Q#114418, answer score: 5
Revisions (0)
No revisions yet.