HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Finding the two largest of five small integers as quickly as possible

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
quicklythefivelargesttwopossiblesmallfindingintegers

Problem

I use a variation of a 5-cross median filter on image data
on a small embedded system, i.e.

x
  x x x
    x


The algorithm is really simple: read 5 unsigned integer values, get the highest 2, do some calculations on those and write back the unsigned integer result.

What is nice is that the 5 integer input values are all in the range of 0-20. The calculated integer value are also in the 0-20 range!

Through profiling, I have figured out that getting the largest two numbers is the bottleneck so I want to speed this part up. What is the fastest way to perform this selection?

The current algorithm uses a 32 bit mask with 1 in the position given by the 5 numbers and a HW-supported CLZ function.

I should say that the CPU is a proprietary one, not available outside of my company. My compiler is GCC but tailor made for this CPU.

I have tried to figure out if I can use a lookup-table but I have failed to generate a key that I can use.

I have $21^5$ combinations for the input but order isn't important, i.e. [5,0,0,0,5] is the same as [5,5,0,0,0].

It happens that the hash-function below produces a perfect hash without collisions!

def hash(x):
    h = 0
    for i in x:
        h = 33*h+i
    return h


But the hash is huge and there is simply not enough memory to use that.

Is there a better algorithm that I can use?
Is it possible to solve my problem using a lookup-table and generating a key?

Solution

In my other answer I suggest that
conditional jumps might be the main impediment to efficiency. As a consequence,
sorting networks come to mind:
they are data agnostic, that is the same sequence of comparisons is executed no
matter the input, with only the swaps being conditional.

Of course, sorting may be too much work; we only need the biggest two numbers.
Lucky for us, selection networks have also been studied. Knuth tells us
that finding the two smallest numbers out of five² can be done with
$\hat{U}_2(5) = 6$ comparisons [1, 5.3.4 ex 19] (and at most as many swaps).

The network he gives in the solutions (rewritten to zero-based arrays) is

$\qquad\displaystyle [0:4]\,[1:4]\,[0:3]\,[1:3]\,[0:2]\,[1:2]$

which implements -- after adjusting the direction of the comparisons -- in
pseudocode as

def selMax2(a : int[])
  a.swap(0,4) if a[0] < a[4]
  a.swap(1,4) if a[1] < a[4]
  a.swap(0,3) if a[0] < a[3]
  a.swap(1,3) if a[1] < a[3]
  a.swap(0,2) if a[0] < a[2]
  a.swap(1,2) if a[1] < a[2]
  return (a[0], a[1])
end


Now, naive implementations still have conditional jumps (across the swap code).
Depending on your machine you can cirumvent them with conditional instructions,
though. x86 seems to be its usual mudpit self; ARM looks more promising since
apparently most operations are conditional
in themselves. If I understand the instructions
correctly, the first swap translates to this, assuming our array values have been
loaded to registers R0 through R4:

CMP     R0,R4
MOVLT   R5 = R0
MOVLT   R0 = R4
MOVLT   R4 = R6


Yes, yes, of course you can use XOR swapping
with EOR.

I just hope your processor has this, or something similar. Of course, if you build the thing for this purpose, maybe you can get the network hard-wired on there?

This is probably (provably?) the best you can do in the classical realm, i.e.
without making use of the limited domain and performing wicked intra-word magicks.

  • Sorting and Searching


by Donald E. Knuth; The Art of Computer Programming Vol. 3 (2nd ed, 1998)

  • Note that this leaves the two selected elements unordered. Ordering them requires an extra comparison, that is $\hat{W}_2(5) = 7$ many in total [1, p234 Table 1].

Code Snippets

def selMax2(a : int[])
  a.swap(0,4) if a[0] < a[4]
  a.swap(1,4) if a[1] < a[4]
  a.swap(0,3) if a[0] < a[3]
  a.swap(1,3) if a[1] < a[3]
  a.swap(0,2) if a[0] < a[2]
  a.swap(1,2) if a[1] < a[2]
  return (a[0], a[1])
end
CMP     R0,R4
MOVLT   R5 = R0
MOVLT   R0 = R4
MOVLT   R4 = R6

Context

StackExchange Computer Science Q#39777, answer score: 11

Revisions (0)

No revisions yet.