patternModerate
Mathematical benefit to use CPU/memory that increases by powers of 2 as 8-bit, 16-bit, 32-bit, 64-bit, etc?
Viewed 0 times
bitbenefitincreasesthatmemorycpuusepowersmathematicaletc
Problem
Historically, processors often increase in bit-size by powers of 2, such as 8-bit, then 16-bit, 32-bit, 64-bit. Although this has not always been the case, it is a well known trend. One benefit is backwards compatibility. Another is that components can be scaled by combining two 8-bit components into a 16-bit, and so on, such as two 8-bit shift registers. I have also heard that some types of mathematical algorithms benefit from using powers of 2 bit sizes, but I cannot find anything about it, nor think of any. Is anyone here aware of any such algorithms, or such an argument for "powers of 2 bit sizes" in general?
Solution
Multiplying and dividing binary integers by powers of 2 is very cheap, like multiplying and dividing by 10 for decimal numbers.
It's qualitatively different from other factors because the other bits don't change, you just move digits left or right (shifting). In hardware for fixed shift counts, it's literally just wiring.
As I mentioned below, array indexing like
https://superuser.com/questions/1735122/why-are-most-of-the-common-processors-bit-counts-powers-of-2/1735399#1735399 and other answers discuss some of the engineering problems you'd have with e.g. 24-bit words on a byte-addressable machine. (24-bit DSPs use word-addressable memory) - some 3-byte words will span wider boundaries, differing in higher bits of the address. But CPU caches and memory systems really want to group binary addresses by ignoring some low bits, not by doing a full division.
In algorithm terms, this is optimization of
Another algorithm where this is relevant is a bit-array: given an integer bit-index, you need to break that up into a byte index or word index, and a bit number within the byte (to use as a shift count).
When
This wouldn't be the case for 24-bit
In hardware it's even more significant: splitting a value into multiple fields is literally just wiring. But for
Another more widely used data structure that benefits from it being very cheap to convert between element index and addresses is C++'s
The API includes
But
It's not rare for code to do
A non-power-of-2
It would also mean
Also, array indexing by an integer is normally just a left-shift to multiply by
Perhaps such a machine would include scaled-index addressing modes so asm that could scale by 3 and 6, with a shift+add like
It's qualitatively different from other factors because the other bits don't change, you just move digits left or right (shifting). In hardware for fixed shift counts, it's literally just wiring.
As I mentioned below, array indexing like
arr[i] with an index that counts in elements will involve (in hardware or software) scaling by the element size, if memory is byte addressable. (If it's not in a loop, it can't be optimized to a pointer increment). IDK if you'd call this an "algorithm", though.https://superuser.com/questions/1735122/why-are-most-of-the-common-processors-bit-counts-powers-of-2/1735399#1735399 and other answers discuss some of the engineering problems you'd have with e.g. 24-bit words on a byte-addressable machine. (24-bit DSPs use word-addressable memory) - some 3-byte words will span wider boundaries, differing in higher bits of the address. But CPU caches and memory systems really want to group binary addresses by ignoring some low bits, not by doing a full division.
In algorithm terms, this is optimization of
i / sizeof(int) into a right shift for unsigned i, and i * sizeof(int) into a left shift.Another algorithm where this is relevant is a bit-array: given an integer bit-index, you need to break that up into a byte index or word index, and a bit number within the byte (to use as a shift count).
#include
int get_bit(unsigned int *bit_array, size_t n)
{
size_t element_idx = n / std::numeric_limits::digits;
int bit_idx = n % std::numeric_limits::digits;
unsigned chunk = bit_array[element_idx];
return (chunk >> bit_idx) & 1;
}When
numeric_limits::digits is a power of 2, like 32 on typical C++ implementations, this unsigned division and remainder become simple right-shift and bitwise AND, like n >> 5 and n & 31.This wouldn't be the case for 24-bit
unsigned int, for example, unless you choose to only use 16 of the 24 bits per unsigned int. Division by a constant 24 or 3 is not very expensive if you have a fast multiplier, but it's non-trivial.In hardware it's even more significant: splitting a value into multiple fields is literally just wiring. But for
n%24, the remainder depends on all 24 bits, not just the low 5 bits, as well as requiring some significant computation.std::vector's .size() member function, and indexingAnother more widely used data structure that benefits from it being very cheap to convert between element index and addresses is C++'s
std::vector, a dynamic array that knows its size and can reallocate when push_back exceeds the current capacity. Typical implementations have 3 pointer members, perhaps named T *m_begin, m_end, and m_capacity.The API includes
.begin() and .data() which are trivial, just return the the start pointer. Similarly .end() can just return the end pointer. (C++ library functions that loop over a range take a start and end pointer, rather than a pointer + length). .push_back can increment m_end += sizeof(T) and check m_end < m_capacity is still true. (The size in bytes to pass to allocator functions can be obtained by casting to char* before subtracting.) None of that benefits from sizeof(T) being a power of 2, which is good for use-cases with T = an odd sized struct.But
std::vector also has a .size() API, typically implemented as return end - begin;. In C++, subtracting pointers returns the distance in elements, not bytes, so in asm it typically looks like sub r0, r1 / sra r0, 2 or something to arithmetic right shift the subtraction result to scale by sizeof(T). If sizeof(T) isn't a power of 2, a right shift won't work for this division. (It is at least a compile-time constant so a multiply and shift can do the trick.)It's not rare for code to do
for (size_t i = 0 ; i < vec.size() ; i++) do stuff with vec[i], or to use .size() for other things. Compilers might transform this loop into a pointer increment, avoiding the division by sizeof(vec[0]), but some uses of .size() won't optimize away.A non-power-of-2
sizeof(T) for common primitive types like int and int * would make some uses of vectors of those types less efficient. (But implementations would probably keep the 3-pointer design, rather than needing to multiply by sizeof(T) more often with one pointer and size_t length, capacity.It would also mean
alignof(int) had to be 1, unless an implementation padded 3-byte int to a larger size. Unless you redefine alignment to mean multiple-of-3 instead of multiple-of-2.Also, array indexing by an integer is normally just a left-shift to multiply by
sizeof(T) to get a byte offset.Perhaps such a machine would include scaled-index addressing modes so asm that could scale by 3 and 6, with a shift+add like
i + (i<<1) instead of just a shift. For example current x86-64 cCode Snippets
#include <limits>
int get_bit(unsigned int *bit_array, size_t n)
{
size_t element_idx = n / std::numeric_limits<unsigned int>::digits;
int bit_idx = n % std::numeric_limits<unsigned int>::digits;
unsigned chunk = bit_array[element_idx];
return (chunk >> bit_idx) & 1;
}Context
StackExchange Computer Science Q#164563, answer score: 11
Revisions (0)
No revisions yet.