patterncppMinor
Aligning your heterogenous uninitialized memory to make the processor happy
Viewed 0 times
heterogenousyourtheprocessormakealigninghappymemoryuninitialized
Problem
After learning more about memory alignment and how it can impact processor data access, I tried to find something in the standard that offers proper memory alignment inside blocks of raw memory that will be used to hold heterogeneous types. With no luck, I had to come up with a solution.
The problem
Assuming we have some variadic template of the form: \$T_0, T_1, ..., T_{n - 1}, T_n\$:
We can't just construct \$T_n\$ at offset
The problem
Assuming we have some variadic template of the form: \$T_0, T_1, ..., T_{n - 1}, T_n\$:
We can't just construct \$T_n\$ at offset
sizeof( \$T_{n-1}\$ ) because they would be in an unaligned address. Example, for the list ` and std::aligned_storage_t storage, doing:
::new ( &storage ) char{}; // OK; can go into any address in terms of alignment
::new ( &storage + sizeof( char ) ) int{}; // now misaligned!
This would look like so in memory:
bytes: [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ]
contents: [ c_0 ][ i_0 ][ i_1 ][ i_2 ][ i_3 ]
Which is clearly unaligned for int. For an int that follows a char to be properly aligned, it should look like this in memory:
bytes: [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7 ]
contents: [ c_0 ][ pad ][ pad ][ pad ][ i_0 ][ i_1 ][ i_2 ][ i_3 ]
This is exactly what your compiler does when you declare a class in order to maintain alignment:
struct my_type ---- struct my_type
{ ---- {
char c; ---- char c;
int i; ---- char padding[ 3 ]; // courtesy of the compiler
}; ---- int i;
---- };
Solution
For some variadic template, I present a way to automatically generate offsets into aligned memory for the respective types in the variadic template. This allows me to in-place construct types while maintaining proper address alignment so that performance is optimal.
The program generates a compile-time offset map, where the \$n\$th offset is added to the starting address of the std::aligned_storage_t<>` where the types willSolution
I only see one easy speedup:
Why do multiple divisions? Just do a single division, get the remainder, and then add the appropriate amount to
while ( memory_used % align != 0 )
++memory_used;Why do multiple divisions? Just do a single division, get the remainder, and then add the appropriate amount to
memory_usedint remainder = memory_used % align;
if (remainder != 0)
memory_used += align - remainder;Code Snippets
while ( memory_used % align != 0 )
++memory_used;int remainder = memory_used % align;
if (remainder != 0)
memory_used += align - remainder;Context
StackExchange Code Review Q#134063, answer score: 2
Revisions (0)
No revisions yet.