HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaCritical

Searching in an array in less than O(n) time

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
searchingarraythantimeless

Problem

I have an array where each element is either one less or one greater than the preceding element \$\{x_i = x_{i-1} \pm 1\}\$. I wish to find an element in it in less than \$O(n)\$ time. I've implemented it like this:

public int searchArray(int[] arr, int i, int elem) {

    if (i > arr.length - 1 || i < 0) {
        return -1;
    }

    if (arr[i] == elem) {
            return i;

    } else {
            int diff = Math.abs(elem - arr[i]);
            int index = searchArray(arr, i + diff, elem);
            if (index == -1) {
                index = searchArray(arr, i - diff, elem);
                if (index == -1) {
                    return -1;
                }
            }
            return index;
    }
}


And calling it like:

int[] arr = {2, 1, 2, 3, 4, 3, 2, 3, 4, 5, 6, 5, 4, 3, 4};
int index = searchArray(arr, 0, 3);


It is working fine, but can it be improved? Specifically, is there a way to do it iteratively? And is the current algorithm less than \$O(n)\$. I guess it is, but I'm not sure.

Solution

While other answers make good points, I have to wonder why you are using recursion. This is such a simple problem to solve with a for loop.

I assume that you are not supposed to start from any index other than index 0, so consider the following routine:

public int searchArray(int[] arr, int elem) {

    for (int i = 0; i < arr.length; ) {
        if (arr[i] == elem) {
            return i;
        }
        i += Math.abs(elem - arr[i]);
    }
    return -1;
}


(If you need to start the search part-way through the array then you can add the offset input parameter again and start i from that).

The bottom line is that recursion is overkill, this system is \$O(n)\$, but the cost of each cycle is less than the same thing using recursion.

I am not aware of any way to solve the given problem with a system of better than \$O(n)\$ complexity.

Discussion on complexity - why this is \$O(n)\$

This answer has generated a lot of discussion about complexity, that this method only ever scans, at most, half the members of the input array, and thus it should be complexity \$O \left( \frac{n}{2} \right )\$ instead of \$O(n)\$. The argument given is something like:


Consider the worst-case data 1,2,1,2,1,2,1,2,1,2,1,2 and the search term 3. For this situation the method will start at data[0], and then skip to data[2], then data[4], and so on. It will never inspect data[1], and other odd-indexed data points. If the search term is even more 'different' than the actual data (e.g. 100) then the method will do only one comparison at data[0] and will then return 'not found' -1.

This is an interesting observation, that the method only ever needs to scan half the data at most. This is especially interesting considering a 'naive' method which just scans the data one-member-at-a-time and returns when it finds the value. That 'naive' method most certainly has \$ O\left(n\right) \$ 'performance' and complexity, and the 'skip-method' will be more than twice as fast.

The important thing to note though, is how the algorithms scale relative to the amount of data, not relative to each other!

So, consider a hypothetical set of worst-case data 1,2,1,2,1,2,.... and the search-term 3. This hypothetical happens to be searched in 4 milliseconds by the skip-method, and in 8 milliseconds by the naive-method. Now we double the amount of data, what happens? The processing time for both methods will double!

In both cases, the performance of the algorithms will double for each doubling of the data volume. This is what makes both algorithms \$ O(n) \$ complexity. From Wikipedia:


In computer science, big O notation is used to classify algorithms by how they respond (e.g., in their processing time or working space requirements) to changes in input size.

Reversing the argument, by suggesting the skip-method has \$O \left( \frac{n}{2} \right) \$ complexity, I would then expect that, if I double the data, the execution time would only increase by a half, or 50%. This is 'obviously' not true for the skip-method (nor the naive-method).

Both methods have \$O(n)\$ complexity because they both scale the same way with increasing volumes of data.

But, just because they scale the same way, does not mean that one method is not better than the other... obviously.

Code Snippets

public int searchArray(int[] arr, int elem) {

    for (int i = 0; i < arr.length; ) {
        if (arr[i] == elem) {
            return i;
        }
        i += Math.abs(elem - arr[i]);
    }
    return -1;
}

Context

StackExchange Code Review Q#36547, answer score: 61

Revisions (0)

No revisions yet.