patterncppMinor
De-linearize a 2d linearized grid and calculate distance between 2 points
Viewed 0 times
linearizedgridpointsdistancebetweencalculateandlinearize
Problem
In a simulation of mine, for which performance is paramount, I have the following bottleneck.
I have a huge 2D grid of dimensions
Currently, what I do is:
How could I improve upon that? What is the most performant way to take two of such integer values, retrieve back
I have a huge 2D grid of dimensions
width and height, whose dimensions are stored in linearized form, i.e. positions at x and y are stored as one integer x + y*width. Every iteration I have to get thousands of such linearized coordinates, de-linearize an calculate distance between 2d coordinates.Currently, what I do is:
y1 = linPos1 / width; //int divided by int disregards remainders
x1 = linPos1 - width * y1;
y2 = linPos2 / width; //int divided by int disregards remainders
x2 = linPos2 - width * y2;
int xDiff = x2-x1;
int yDiff = y2-y1;
double distance = sqrt(xDiff*xDiff + yDiff*yDiff);How could I improve upon that? What is the most performant way to take two of such integer values, retrieve back
x and y and then calculate the 2D distance between said points? I am open to all suggestions, including SIMD instructions.Solution
What you are doing is packing the two integer coordinates into a single integer, then unpacking them.
Let us assume that \$\text{height} \leq \text{width}\$, let
Using these assumptions, the packing can be done by writing the coordinates in base
Using our assumptions this is an injective function, i.e. to one pair \$(x,y)\$ there exists at most one \$k\$ such that \$k=wy+x\$, and each such \$k\$ uniquely determine the \$(x,y)\$. This means that the packing-unpacking is well-defined, which is great: same points will have the same representations, and a representation corresponds to exactly one point.
Your code,
could be written a few different ways, e.g. using
or by using
Modern compilers will compile all of these to the same instruction:
So that's about as good as it gets, unless we can use some faster form of integer division(from now on, simply division).
If a number is in base \$b\$, then it's really easy to divide it by powers of \$b\$: just shift the numbers around; e.g.
That's all fine and well, but string operations aren't going to be faster than
Let's find the smallest \$n \in \mathbb N\$, such that \$w \leq 2^n\$. Let \$ w = 2^n\$. Perhaps you won't use a few coordinates, but that's not really a problem.
Now our division code will look like this.
Here
Now let's see if the compilers are clever enough to do this for us automatically.
Let's see the assembly code generated from the code above, if we allow the compiler to optimize a bit.
As you can see, besides some sign-checking, the two codes are identical. In fact, after specifying that we are working with unsigned integers, we get the same output:
This means that we can keep our more readable operators:
Now let's look at the last part of your code, and let's change it to a more general formula.
Where
Depending on your needs, you could use other metrics; as you are working on an integer lattice, I'd suggest you take a look at the taxicab metric. Its calculation is faster, and there is less room for numerical errors.
Implementing these changes result in code, which is a constant time faster, than your
original one.
While it was fun figuring out, I doubt this is the kind of speed up you were hoping for. I think optimizing other parts of your code could have greater benefits. Perhaps consider posting them as a separate question.
Let us assume that \$\text{height} \leq \text{width}\$, let
w = width. Translate the lattice so that \$ 0 \leq x,y\leq w-1\$.Using these assumptions, the packing can be done by writing the coordinates in base
w. For example, if w = 10, x=4, and y=7, then the representation is wy + x = 710 + 4 = 74. Using our assumptions this is an injective function, i.e. to one pair \$(x,y)\$ there exists at most one \$k\$ such that \$k=wy+x\$, and each such \$k\$ uniquely determine the \$(x,y)\$. This means that the packing-unpacking is well-defined, which is great: same points will have the same representations, and a representation corresponds to exactly one point.
Your code,
y = k / w;
x = k - w * y;could be written a few different ways, e.g. using
%,y = k / w
x = k % wor by using
std::div.Modern compilers will compile all of these to the same instruction:
idivl, so you don't have to worry about this part, just don't forget to tell your compiler to optimize slightly. For example, g++ needs the -O1 flag.So that's about as good as it gets, unless we can use some faster form of integer division(from now on, simply division).
If a number is in base \$b\$, then it's really easy to divide it by powers of \$b\$: just shift the numbers around; e.g.
1234/10^3 = 1, 1234 % 10^3 = 4.That's all fine and well, but string operations aren't going to be faster than
idivl. Luckily we don't need those, because by choosing \$w=2^n\$, we can leverage bit-shifts, and bitwise-and to do exactly the same for us, really fast.Let's find the smallest \$n \in \mathbb N\$, such that \$w \leq 2^n\$. Let \$ w = 2^n\$. Perhaps you won't use a few coordinates, but that's not really a problem.
Now our division code will look like this.
y = k >> n
x = k & ((1 << n) - 1)Here
x will be the first nth bits of k, and y will be the rest. For example, if k = 1101, n=2, then x = 1101 & ((1 > 2 = 0011 = 11.Now let's see if the compilers are clever enough to do this for us automatically.
int func1(int a) {
int x = a / 1024;
int y = a % 1024;
return x+y;
}
int func2(int a) {
int x = a >> 10;
int y = a & ((1 << 10) - 1);
return x+y;
}Let's see the assembly code generated from the code above, if we allow the compiler to optimize a bit.
_Z5func1i:
leal 1023(%rdi), %eax
testl %edi, %edi
movl %edi, %edx
cmovns %edi, %eax
sarl $31, %edx
shrl $22, %edx
sarl $10, %eax
addl %edx, %edi
andl $1023, %edi
subl %edx, %edi
addl %edi, %eax
ret
_Z5func2j:
movl %edi, %eax
andl $1023, %edi
shrl $10, %eax
addl %edi, %eax
retAs you can see, besides some sign-checking, the two codes are identical. In fact, after specifying that we are working with unsigned integers, we get the same output:
movl %edi, %eax
andl $1023, %edi
shrl $10, %eax
addl %edi, %eax
retThis means that we can keep our more readable operators:
%, and /. Hurray!Now let's look at the last part of your code, and let's change it to a more general formula.
double distance = metric(xDiff,yDiff);Where
metric is a function calculating a distance. In your case, metric corresponds to the Euclidean metric,double metric(int dx, int dy) {
return std::sqrt(dx*dx+dy*dy);
}Depending on your needs, you could use other metrics; as you are working on an integer lattice, I'd suggest you take a look at the taxicab metric. Its calculation is faster, and there is less room for numerical errors.
double metric(int dx, int dy) {
return std::abs(dx) + std::abs(dy);
}Implementing these changes result in code, which is a constant time faster, than your
original one.
While it was fun figuring out, I doubt this is the kind of speed up you were hoping for. I think optimizing other parts of your code could have greater benefits. Perhaps consider posting them as a separate question.
Code Snippets
y = k / w;
x = k - w * y;y = k / w
x = k % wy = k >> n
x = k & ((1 << n) - 1)int func1(int a) {
int x = a / 1024;
int y = a % 1024;
return x+y;
}
int func2(int a) {
int x = a >> 10;
int y = a & ((1 << 10) - 1);
return x+y;
}_Z5func1i:
leal 1023(%rdi), %eax
testl %edi, %edi
movl %edi, %edx
cmovns %edi, %eax
sarl $31, %edx
shrl $22, %edx
sarl $10, %eax
addl %edx, %edi
andl $1023, %edi
subl %edx, %edi
addl %edi, %eax
ret
_Z5func2j:
movl %edi, %eax
andl $1023, %edi
shrl $10, %eax
addl %edi, %eax
retContext
StackExchange Code Review Q#161212, answer score: 3
Revisions (0)
No revisions yet.