HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Aligning your heterogenous uninitialized memory to make the processor happy

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
heterogenousyourtheprocessormakealigninghappymemoryuninitialized

Problem

After learning more about memory alignment and how it can impact processor data access, I tried to find something in the standard that offers proper memory alignment inside blocks of raw memory that will be used to hold heterogeneous types. With no luck, I had to come up with a solution.

The problem

Assuming we have some variadic template of the form: \$T_0, T_1, ..., T_{n - 1}, T_n\$:

We can't just construct \$T_n\$ at offset sizeof( \$T_{n-1}\$ ) because they would be in an unaligned address. Example, for the list ` and std::aligned_storage_t storage, doing:

::new ( &storage ) char{}; // OK; can go into any address in terms of alignment
::new ( &storage + sizeof( char ) ) int{}; // now misaligned!


This would look like so in memory:

bytes:    [  0  ][  1  ][  2  ][  3  ][  4  ]
contents: [ c_0 ][ i_0 ][ i_1 ][ i_2 ][ i_3 ]


Which is clearly unaligned for
int. For an int that follows a char to be properly aligned, it should look like this in memory:

bytes:    [  0  ][  1  ][  2  ][  3  ][  4  ][  5  ][  6  ][  7  ]
contents: [ c_0 ][ pad ][ pad ][ pad ][ i_0 ][ i_1 ][ i_2 ][ i_3 ]


This is exactly what your compiler does when you declare a class in order to maintain alignment:

struct my_type         ----         struct my_type
{                      ----         {
    char c;            ----             char c;
    int i;             ----             char padding[ 3 ]; // courtesy of the compiler
};                     ----             int i;
                       ----         };


Solution

For some variadic template, I present a way to automatically generate offsets into aligned memory for the respective types in the variadic template. This allows me to in-place construct types while maintaining proper address alignment so that performance is optimal.

The program generates a compile-time offset map, where the \$n\$th offset is added to the starting address of the
std::aligned_storage_t<>` where the types will

Solution

I only see one easy speedup:

while ( memory_used % align != 0 )
     ++memory_used;


Why do multiple divisions? Just do a single division, get the remainder, and then add the appropriate amount to memory_used

int remainder = memory_used % align;
if (remainder != 0)
    memory_used += align - remainder;

Code Snippets

while ( memory_used % align != 0 )
     ++memory_used;
int remainder = memory_used % align;
if (remainder != 0)
    memory_used += align - remainder;

Context

StackExchange Code Review Q#134063, answer score: 2

Revisions (0)

No revisions yet.