HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Finding an element in multiple sorted lists efficiently

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
efficientlyfindingelementmultiplelistssorted

Problem

I have \$k\$ sorted Lists, each of size \$n\$. Currently I have hard-coded 5 sorted Lists each of size 3, but in general that is configurable.

I would like to search for single element in each of the \$k\$ Lists (or its predecessor, if it doesn't exist).

Obviously I can binary search each List individually, resulting in a \$O(k*log(n))\$ runtime. But I think we can do better than that: after all, we're doing the same search \$k\$ times.

Could this be improved?

private static TreeSet tree = new TreeSet();

public SearchItem(final List> inputs) {
    tree = new TreeSet();
    for (List input : inputs) {
        tree.addAll(input);
    }
}

public Integer getItem(final Integer x) {
    if(tree.contains(x)) {
        return x;
    } else {
        return tree.higher(x);
    }
}

public static void main(String[] args) {
    List> lists = new ArrayList>();

    List list1 = new ArrayList(Arrays.asList(3, 4, 6));
    List list2 = new ArrayList(Arrays.asList(1, 2, 3));
    List list3 = new ArrayList(Arrays.asList(2, 3, 6));
    List list4 = new ArrayList(Arrays.asList(1, 2, 3));
    List list5 = new ArrayList(Arrays.asList(4, 8, 13));

    lists.add(list1);
    lists.add(list2);
    lists.add(list3);
    lists.add(list4);
    lists.add(list5);

    SearchItem search = new SearchItem(lists);
    System.out.println(tree);

    Integer dataOuput = search.getItem(5);

    System.out.println(dataOuput);
}

Solution

I think you have some misguided assumptions here.

For a start, what you have now is not \$O(k \log(n))\$, it first scans and sorts all the data in to the tree, which is a \$O(N \log(N))\$ operation, where \$N\$ is the cumulative data size (sum of input-array sizes). Then, the check on it, is a \$O(\log(N))\$ check, so, your algorithm is in the order of \$O(N\log(N))\$.

The complexity you are worried about \$O(k\log(n))\$ is really quite trivial. \$k\$ is a small number (5), and log(n) is always small... in many ways, after n = 128 it is effectively a constant....

So, I would do the following:

  • binary search each List



  • keep the 'min' value



with the code:

Integer result = null;
for (List data : lists) {
    int pos = Collections.binarySearch(data, input);
    if (pos >= 0) {
        //exact match, return
        return data.get(pos);
    }
    pos = -pos - 1;
    if (pos  0) {
            result = found;
        }
    }
}
return result;


Now, if you wanted to be fancy, and you wanted the fastest response times, you could put each binary search in to a Callable, and run them in parallel....

Then rank the results, and teturn it all in time complexity \$O(\log(n))\$ assuming \$k\$ is less than your hardware CPU count.

Code Snippets

Integer result = null;
for (List<Integer> data : lists) {
    int pos = Collections.binarySearch(data, input);
    if (pos >= 0) {
        //exact match, return
        return data.get(pos);
    }
    pos = -pos - 1;
    if (pos < data.size()) {
        Integer found = data.get(pos);
        if (result == null || result.compareTo(found) > 0) {
            result = found;
        }
    }
}
return result;

Context

StackExchange Code Review Q#42982, answer score: 8

Revisions (0)

No revisions yet.