HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Mathematical benefit to use CPU/memory that increases by powers of 2 as 8-bit, 16-bit, 32-bit, 64-bit, etc?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
bitbenefitincreasesthatmemorycpuusepowersmathematicaletc

Problem

Historically, processors often increase in bit-size by powers of 2, such as 8-bit, then 16-bit, 32-bit, 64-bit. Although this has not always been the case, it is a well known trend. One benefit is backwards compatibility. Another is that components can be scaled by combining two 8-bit components into a 16-bit, and so on, such as two 8-bit shift registers. I have also heard that some types of mathematical algorithms benefit from using powers of 2 bit sizes, but I cannot find anything about it, nor think of any. Is anyone here aware of any such algorithms, or such an argument for "powers of 2 bit sizes" in general?

Solution

Multiplying and dividing binary integers by powers of 2 is very cheap, like multiplying and dividing by 10 for decimal numbers.

It's qualitatively different from other factors because the other bits don't change, you just move digits left or right (shifting). In hardware for fixed shift counts, it's literally just wiring.

As I mentioned below, array indexing like arr[i] with an index that counts in elements will involve (in hardware or software) scaling by the element size, if memory is byte addressable. (If it's not in a loop, it can't be optimized to a pointer increment). IDK if you'd call this an "algorithm", though.

https://superuser.com/questions/1735122/why-are-most-of-the-common-processors-bit-counts-powers-of-2/1735399#1735399 and other answers discuss some of the engineering problems you'd have with e.g. 24-bit words on a byte-addressable machine. (24-bit DSPs use word-addressable memory) - some 3-byte words will span wider boundaries, differing in higher bits of the address. But CPU caches and memory systems really want to group binary addresses by ignoring some low bits, not by doing a full division.

In algorithm terms, this is optimization of i / sizeof(int) into a right shift for unsigned i, and i * sizeof(int) into a left shift.

Another algorithm where this is relevant is a bit-array: given an integer bit-index, you need to break that up into a byte index or word index, and a bit number within the byte (to use as a shift count).

#include 
int get_bit(unsigned int *bit_array, size_t n)
{
    size_t element_idx = n / std::numeric_limits::digits;
    int bit_idx = n % std::numeric_limits::digits;
    unsigned chunk = bit_array[element_idx];
    return  (chunk >> bit_idx) & 1;
}


When numeric_limits::digits is a power of 2, like 32 on typical C++ implementations, this unsigned division and remainder become simple right-shift and bitwise AND, like n >> 5 and n & 31.

This wouldn't be the case for 24-bit unsigned int, for example, unless you choose to only use 16 of the 24 bits per unsigned int. Division by a constant 24 or 3 is not very expensive if you have a fast multiplier, but it's non-trivial.

In hardware it's even more significant: splitting a value into multiple fields is literally just wiring. But for n%24, the remainder depends on all 24 bits, not just the low 5 bits, as well as requiring some significant computation.

std::vector's .size() member function, and indexing

Another more widely used data structure that benefits from it being very cheap to convert between element index and addresses is C++'s std::vector, a dynamic array that knows its size and can reallocate when push_back exceeds the current capacity. Typical implementations have 3 pointer members, perhaps named T *m_begin, m_end, and m_capacity.

The API includes .begin() and .data() which are trivial, just return the the start pointer. Similarly .end() can just return the end pointer. (C++ library functions that loop over a range take a start and end pointer, rather than a pointer + length). .push_back can increment m_end += sizeof(T) and check m_end < m_capacity is still true. (The size in bytes to pass to allocator functions can be obtained by casting to char* before subtracting.) None of that benefits from sizeof(T) being a power of 2, which is good for use-cases with T = an odd sized struct.

But std::vector also has a .size() API, typically implemented as return end - begin;. In C++, subtracting pointers returns the distance in elements, not bytes, so in asm it typically looks like sub r0, r1 / sra r0, 2 or something to arithmetic right shift the subtraction result to scale by sizeof(T). If sizeof(T) isn't a power of 2, a right shift won't work for this division. (It is at least a compile-time constant so a multiply and shift can do the trick.)

It's not rare for code to do for (size_t i = 0 ; i < vec.size() ; i++) do stuff with vec[i], or to use .size() for other things. Compilers might transform this loop into a pointer increment, avoiding the division by sizeof(vec[0]), but some uses of .size() won't optimize away.

A non-power-of-2 sizeof(T) for common primitive types like int and int * would make some uses of vectors of those types less efficient. (But implementations would probably keep the 3-pointer design, rather than needing to multiply by sizeof(T) more often with one pointer and size_t length, capacity.

It would also mean alignof(int) had to be 1, unless an implementation padded 3-byte int to a larger size. Unless you redefine alignment to mean multiple-of-3 instead of multiple-of-2.

Also, array indexing by an integer is normally just a left-shift to multiply by sizeof(T) to get a byte offset.

Perhaps such a machine would include scaled-index addressing modes so asm that could scale by 3 and 6, with a shift+add like i + (i<<1) instead of just a shift. For example current x86-64 c

Code Snippets

#include <limits>
int get_bit(unsigned int *bit_array, size_t n)
{
    size_t element_idx = n / std::numeric_limits<unsigned int>::digits;
    int bit_idx = n % std::numeric_limits<unsigned int>::digits;
    unsigned chunk = bit_array[element_idx];
    return  (chunk >> bit_idx) & 1;
}

Context

StackExchange Computer Science Q#164563, answer score: 11

Revisions (0)

No revisions yet.