patternModerate
Finding the two largest of five small integers as quickly as possible
Viewed 0 times
quicklythefivelargesttwopossiblesmallfindingintegers
Problem
I use a variation of a 5-cross median filter on image data
on a small embedded system, i.e.
The algorithm is really simple: read 5 unsigned integer values, get the highest 2, do some calculations on those and write back the unsigned integer result.
What is nice is that the 5 integer input values are all in the range of 0-20. The calculated integer value are also in the 0-20 range!
Through profiling, I have figured out that getting the largest two numbers is the bottleneck so I want to speed this part up. What is the fastest way to perform this selection?
The current algorithm uses a 32 bit mask with 1 in the position given by the 5 numbers and a HW-supported CLZ function.
I should say that the CPU is a proprietary one, not available outside of my company. My compiler is GCC but tailor made for this CPU.
I have tried to figure out if I can use a lookup-table but I have failed to generate a key that I can use.
I have $21^5$ combinations for the input but order isn't important, i.e.
It happens that the hash-function below produces a perfect hash without collisions!
But the hash is huge and there is simply not enough memory to use that.
Is there a better algorithm that I can use?
Is it possible to solve my problem using a lookup-table and generating a key?
on a small embedded system, i.e.
x
x x x
xThe algorithm is really simple: read 5 unsigned integer values, get the highest 2, do some calculations on those and write back the unsigned integer result.
What is nice is that the 5 integer input values are all in the range of 0-20. The calculated integer value are also in the 0-20 range!
Through profiling, I have figured out that getting the largest two numbers is the bottleneck so I want to speed this part up. What is the fastest way to perform this selection?
The current algorithm uses a 32 bit mask with 1 in the position given by the 5 numbers and a HW-supported CLZ function.
I should say that the CPU is a proprietary one, not available outside of my company. My compiler is GCC but tailor made for this CPU.
I have tried to figure out if I can use a lookup-table but I have failed to generate a key that I can use.
I have $21^5$ combinations for the input but order isn't important, i.e.
[5,0,0,0,5] is the same as [5,5,0,0,0].It happens that the hash-function below produces a perfect hash without collisions!
def hash(x):
h = 0
for i in x:
h = 33*h+i
return hBut the hash is huge and there is simply not enough memory to use that.
Is there a better algorithm that I can use?
Is it possible to solve my problem using a lookup-table and generating a key?
Solution
In my other answer I suggest that
conditional jumps might be the main impediment to efficiency. As a consequence,
sorting networks come to mind:
they are data agnostic, that is the same sequence of comparisons is executed no
matter the input, with only the swaps being conditional.
Of course, sorting may be too much work; we only need the biggest two numbers.
Lucky for us, selection networks have also been studied. Knuth tells us
that finding the two smallest numbers out of five² can be done with
$\hat{U}_2(5) = 6$ comparisons [1, 5.3.4 ex 19] (and at most as many swaps).
The network he gives in the solutions (rewritten to zero-based arrays) is
$\qquad\displaystyle [0:4]\,[1:4]\,[0:3]\,[1:3]\,[0:2]\,[1:2]$
which implements -- after adjusting the direction of the comparisons -- in
pseudocode as
Now, naive implementations still have conditional jumps (across the swap code).
Depending on your machine you can cirumvent them with conditional instructions,
though. x86 seems to be its usual mudpit self; ARM looks more promising since
apparently most operations are conditional
in themselves. If I understand the instructions
correctly, the first swap translates to this, assuming our array values have been
loaded to registers
Yes, yes, of course you can use XOR swapping
with EOR.
I just hope your processor has this, or something similar. Of course, if you build the thing for this purpose, maybe you can get the network hard-wired on there?
This is probably (provably?) the best you can do in the classical realm, i.e.
without making use of the limited domain and performing wicked intra-word magicks.
by Donald E. Knuth; The Art of Computer Programming Vol. 3 (2nd ed, 1998)
conditional jumps might be the main impediment to efficiency. As a consequence,
sorting networks come to mind:
they are data agnostic, that is the same sequence of comparisons is executed no
matter the input, with only the swaps being conditional.
Of course, sorting may be too much work; we only need the biggest two numbers.
Lucky for us, selection networks have also been studied. Knuth tells us
that finding the two smallest numbers out of five² can be done with
$\hat{U}_2(5) = 6$ comparisons [1, 5.3.4 ex 19] (and at most as many swaps).
The network he gives in the solutions (rewritten to zero-based arrays) is
$\qquad\displaystyle [0:4]\,[1:4]\,[0:3]\,[1:3]\,[0:2]\,[1:2]$
which implements -- after adjusting the direction of the comparisons -- in
pseudocode as
def selMax2(a : int[])
a.swap(0,4) if a[0] < a[4]
a.swap(1,4) if a[1] < a[4]
a.swap(0,3) if a[0] < a[3]
a.swap(1,3) if a[1] < a[3]
a.swap(0,2) if a[0] < a[2]
a.swap(1,2) if a[1] < a[2]
return (a[0], a[1])
endNow, naive implementations still have conditional jumps (across the swap code).
Depending on your machine you can cirumvent them with conditional instructions,
though. x86 seems to be its usual mudpit self; ARM looks more promising since
apparently most operations are conditional
in themselves. If I understand the instructions
correctly, the first swap translates to this, assuming our array values have been
loaded to registers
R0 through R4:CMP R0,R4
MOVLT R5 = R0
MOVLT R0 = R4
MOVLT R4 = R6Yes, yes, of course you can use XOR swapping
with EOR.
I just hope your processor has this, or something similar. Of course, if you build the thing for this purpose, maybe you can get the network hard-wired on there?
This is probably (provably?) the best you can do in the classical realm, i.e.
without making use of the limited domain and performing wicked intra-word magicks.
- Sorting and Searching
by Donald E. Knuth; The Art of Computer Programming Vol. 3 (2nd ed, 1998)
- Note that this leaves the two selected elements unordered. Ordering them requires an extra comparison, that is $\hat{W}_2(5) = 7$ many in total [1, p234 Table 1].
Code Snippets
def selMax2(a : int[])
a.swap(0,4) if a[0] < a[4]
a.swap(1,4) if a[1] < a[4]
a.swap(0,3) if a[0] < a[3]
a.swap(1,3) if a[1] < a[3]
a.swap(0,2) if a[0] < a[2]
a.swap(1,2) if a[1] < a[2]
return (a[0], a[1])
endCMP R0,R4
MOVLT R5 = R0
MOVLT R0 = R4
MOVLT R4 = R6Context
StackExchange Computer Science Q#39777, answer score: 11
Revisions (0)
No revisions yet.