HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Find binary sequences with low peak sidelobe level of autocorrelation function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
peaklowwithlevelfunctionautocorrelationbinaryfindsequencessidelobe

Problem

I would like to find all binary sequences from the specified range, which has low peak sidelobe level of its autocorrelation function.

Here is the solution (this is simplified version of the real code) using GCC Inline Assembly with AT&T syntax. Binary sequences are represented as sequence of bits in the integer number. The example is a complete C program which finds and prints the 13-position Barker Code.

#include 
#include 
#include 

// A helper function, which is performed very very rare.
void SaveCode (const uint8_t length, const uint64_t code);

int main (int argc, char * argv [])
{
    const uint8_t  sideLobeLimit=  1;
    const uint8_t  length       = 13;
    const uint64_t beginCode    =  1ull > i) & 0x01 ? printf ("+") : printf ("-");
    }
    printf ("\n");
}


It can be built with:

gcc main.c -o main


I'd like to hear any suggestions about how to:

  • improve the algorithm;



  • improve performance (use some tricks or instruction reordering or something else);



  • improve comments (content and translation);



  • improve readability (set aliases for CPU registers if it's possible or something else);



  • any other suggestions

Solution

I have some suggestions for improving the algorithm's performance:

  • Seriously consider writing it in C, with inline assembly for popcount (or use the gcc __builtin_popcount(var)). Compilers have register allocators, can inline the output function, and should be able to schedule this relatively simple code quite well. It would also make it easier to unroll the loop, either manually or by asking the compiler to do it.



  • Use register constraints rather than explicit registers. The constraints are aliases for the register (google "gcc inline assembly"). If the compiler decides which registers are used, it can reuse their contents decide what to clobber. It can also reduce clobbering at call sites.



  • Iterate o - l rather than o. It simplifies your test to a test against zero, which saves one instruction.



  • If possible, use 32-bit integers for arithmetic. Modern PCs are optimized for 32-bit arithmetic, and the instructions are shorter.



  • Try alternative methods for computing the absolute value. If there is a 50/50 distribution of positive and negative sidelobes, then other methods are likely faster.



  • Construct the output string in an array on the stack, and output the string with puts (remember the terminating null character). The repeated putc calls can be far more expensive then the actual computation. You can ignore this comment if you do not plan to output the result to stdout.



  • Try if add 1 is faster than inc. inc does not change the carry flag, so it causes a dependency between previous and following instructions.



I also have a few comments for the readability:

  • Use one name for variables instead of many.



  • Switch to C, or at least provide C pseudocode.



  • Provide a link for the wiki page



You should also not use a raw rdtsc for performance readings because it does not order memory accesses. Use rdtscp or rdtsc with cpuid before and after.

Context

StackExchange Code Review Q#68443, answer score: 4

Revisions (0)

No revisions yet.