HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

De-linearize a 2d linearized grid and calculate distance between 2 points

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
linearizedgridpointsdistancebetweencalculateandlinearize

Problem

In a simulation of mine, for which performance is paramount, I have the following bottleneck.

I have a huge 2D grid of dimensions width and height, whose dimensions are stored in linearized form, i.e. positions at x and y are stored as one integer x + y*width. Every iteration I have to get thousands of such linearized coordinates, de-linearize an calculate distance between 2d coordinates.

Currently, what I do is:

y1 = linPos1 / width; //int divided by int disregards remainders
x1 = linPos1 - width * y1;

y2 = linPos2 / width; //int divided by int disregards remainders
x2 = linPos2 - width * y2;

int xDiff = x2-x1;
int yDiff = y2-y1;
double distance = sqrt(xDiff*xDiff  + yDiff*yDiff);


How could I improve upon that? What is the most performant way to take two of such integer values, retrieve back x and y and then calculate the 2D distance between said points? I am open to all suggestions, including SIMD instructions.

Solution

What you are doing is packing the two integer coordinates into a single integer, then unpacking them.

Let us assume that \$\text{height} \leq \text{width}\$, let w = width. Translate the lattice so that \$ 0 \leq x,y\leq w-1\$.

Using these assumptions, the packing can be done by writing the coordinates in base w. For example, if w = 10, x=4, and y=7, then the representation is wy + x = 710 + 4 = 74.

Using our assumptions this is an injective function, i.e. to one pair \$(x,y)\$ there exists at most one \$k\$ such that \$k=wy+x\$, and each such \$k\$ uniquely determine the \$(x,y)\$. This means that the packing-unpacking is well-defined, which is great: same points will have the same representations, and a representation corresponds to exactly one point.

Your code,

y = k / w;
x = k - w * y;


could be written a few different ways, e.g. using %,

y = k / w
x = k % w


or by using std::div.

Modern compilers will compile all of these to the same instruction: idivl, so you don't have to worry about this part, just don't forget to tell your compiler to optimize slightly. For example, g++ needs the -O1 flag.

So that's about as good as it gets, unless we can use some faster form of integer division(from now on, simply division).

If a number is in base \$b\$, then it's really easy to divide it by powers of \$b\$: just shift the numbers around; e.g. 1234/10^3 = 1, 1234 % 10^3 = 4.

That's all fine and well, but string operations aren't going to be faster than idivl. Luckily we don't need those, because by choosing \$w=2^n\$, we can leverage bit-shifts, and bitwise-and to do exactly the same for us, really fast.

Let's find the smallest \$n \in \mathbb N\$, such that \$w \leq 2^n\$. Let \$ w = 2^n\$. Perhaps you won't use a few coordinates, but that's not really a problem.
Now our division code will look like this.

y = k >> n
x = k & ((1 << n) - 1)


Here x will be the first nth bits of k, and y will be the rest. For example, if k = 1101, n=2, then
x = 1101 & ((1 > 2 = 0011 = 11.

Now let's see if the compilers are clever enough to do this for us automatically.

int func1(int a) {
    int x = a / 1024;
    int y = a % 1024;
    return x+y;
}

int func2(int a) {
    int x = a >> 10;
    int y = a & ((1 << 10) - 1);
    return x+y;
}


Let's see the assembly code generated from the code above, if we allow the compiler to optimize a bit.

_Z5func1i:
    leal    1023(%rdi), %eax
    testl   %edi, %edi
    movl    %edi, %edx
    cmovns  %edi, %eax
    sarl    $31, %edx
    shrl    $22, %edx
    sarl    $10, %eax
    addl    %edx, %edi
    andl    $1023, %edi
    subl    %edx, %edi
    addl    %edi, %eax
    ret

_Z5func2j:
    movl    %edi, %eax
    andl    $1023, %edi
    shrl    $10, %eax
    addl    %edi, %eax
    ret


As you can see, besides some sign-checking, the two codes are identical. In fact, after specifying that we are working with unsigned integers, we get the same output:

movl    %edi, %eax
    andl    $1023, %edi
    shrl    $10, %eax
    addl    %edi, %eax
    ret


This means that we can keep our more readable operators: %, and /. Hurray!

Now let's look at the last part of your code, and let's change it to a more general formula.

double distance = metric(xDiff,yDiff);


Where metric is a function calculating a distance. In your case, metric corresponds to the Euclidean metric,

double metric(int dx, int dy) {
   return std::sqrt(dx*dx+dy*dy);
}


Depending on your needs, you could use other metrics; as you are working on an integer lattice, I'd suggest you take a look at the taxicab metric. Its calculation is faster, and there is less room for numerical errors.

double metric(int dx, int dy) {
   return std::abs(dx) + std::abs(dy);
}


Implementing these changes result in code, which is a constant time faster, than your
original one.

While it was fun figuring out, I doubt this is the kind of speed up you were hoping for. I think optimizing other parts of your code could have greater benefits. Perhaps consider posting them as a separate question.

Code Snippets

y = k / w;
x = k - w * y;
y = k / w
x = k % w
y = k >> n
x = k & ((1 << n) - 1)
int func1(int a) {
    int x = a / 1024;
    int y = a % 1024;
    return x+y;
}

int func2(int a) {
    int x = a >> 10;
    int y = a & ((1 << 10) - 1);
    return x+y;
}
_Z5func1i:
    leal    1023(%rdi), %eax
    testl   %edi, %edi
    movl    %edi, %edx
    cmovns  %edi, %eax
    sarl    $31, %edx
    shrl    $22, %edx
    sarl    $10, %eax
    addl    %edx, %edi
    andl    $1023, %edi
    subl    %edx, %edi
    addl    %edi, %eax
    ret

_Z5func2j:
    movl    %edi, %eax
    andl    $1023, %edi
    shrl    $10, %eax
    addl    %edi, %eax
    ret

Context

StackExchange Code Review Q#161212, answer score: 3

Revisions (0)

No revisions yet.