HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Compile-time string hash

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
hashstringtimecompile

Problem

A while back this question proposed a constexpr compile-time Sieve of Eratosthenes. It also linked to another SO question about compile-time computation of CRCs. I've found a place where I'd like to precompute the hash of a few C-strings, so I used those previous questions as a base.

I'm implementing the one-at-a-time hash algorithm for C-style strings. I'm stuck to using C++11, so the function has to be recursive and without any local state:

namespace ct
{

constexpr std::uint32_t stringLength(const char * cstr)
{
    return (*cstr != '\0') ? (stringLength(cstr + 1) + 1) : 0;
}

constexpr std::uint32_t sumSHL(std::uint32_t h, std::uint32_t shift) { return h + (h > shift); }
constexpr std::uint32_t xorSHR(std::uint32_t h, std::uint32_t shift) { return h ^ (h >> shift); }

constexpr std::uint32_t hashFinishImpl(std::uint32_t h)
{
    // h += (h > 11)
    // h += (h >  6)
    return xorSHR(sumSHL(h + c, 10), 6);
}

constexpr std::uint32_t hashImpl(const char * cstr, std::uint32_t length, std::uint32_t h)
{
    return (length != 0) ? hashImpl(cstr + 1, length - 1, hashStepImpl(h, *cstr)) : hashFinishImpl(h);
}

constexpr std::uint32_t hashCString(const char * cstr)
{
    return hashImpl(cstr, stringLength(cstr), 0);
}

} // namespace ct {}


I'm thinking it looks a bit convoluted, but couldn't think of a better way of handling it... Any suggestions are appreciated.

Solution

Your code looks very clean to me and I already like it very much. A few minor remarks.

  • Assuming that you'll probably want to provide this functionality in a header file, you should declare all your functions as inline. Even if they're not in a header file, it doesn't hurt to do so.



  • Consider declaring your functions as noexcept, too.



  • I'd prefer to see the implementation details hidden in a “detailnamespace to make it clear that only hashCString is supposed to be used directly. (Or is this assumption incorrect?)



  • ct::hashCString is not the most descriptive name. I'd prefer if the name somehow indicated the implemented hash algorithm.



  • You don't actually need the length of the string. Just replace length != 0 with *cstr != '\0' in hashImpl and figure out the length as you go. This will save you one function.



  • It seems to me that “plus” or “add” would be more consistent with “xor” than “sum” as the emphasis is on the operation.



-
Your code assumes that char is an 8 bit integer. It would also be nice if you could hash arrays of signed and unsigned chars alike. Something like the following might do.

template 
constexpr typename std::enable_if::value && (CHAR_BIT * sizeof(CharT) == 8), std::uint8_t>::type
hashCString(const CharT *const s) noexcept;


Instead of requiring CHAR_BIT sizeof(CharT) == 8, you could relax it to (CHAR_BIT sizeof(CharT)) % 8 == 0 and manually loop over each octet in wider types. Smells like overkill, though.

  • Another possible generalization would be to provide the classical interface that takes a pair of “begin” and “end” iterators. This might be handy for hashing sub-strings, strings with embedded zeros or strings that are not terminated with a 0 byte. You can still provide your single-argument version as a convenience overload.

Code Snippets

template <typename CharT>
constexpr typename std::enable_if<std::is_integral<CharT>::value && (CHAR_BIT * sizeof(CharT) == 8), std::uint8_t>::type
hashCString(const CharT *const s) noexcept;

Context

StackExchange Code Review Q#124295, answer score: 3

Revisions (0)

No revisions yet.