patternpythonMinor

Counting the number of elements in a sorted array

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

numberthecountingelementsarraysorted

Problem

Suppose I have a sorted array, and each element may appear multiple times.

To make it simple, here is an example:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3]

The expected output is the number of each element. In this example, the results are:

number of 1 is 17
number of 2 is 6
number of 3 is 9

I want to find a solution where the algorithm time complexity is lower than \$O(n)\$. My idea is similar to binary search:

When I meet with equal neighbour, I will jump the tail pointer by 1, by 2, by 4, etc. and at the same time, I will make cur pointer move to the tail position

When I meet with unequal case, I will shrink the tail pointer to be in the middle of cur and tail

The diff between cur and last element tail is the count of the current elements

Any advice on algorithm time complexity improvement, code bugs or general code style advice is appreciated.

```
from collections import defaultdict
import random
def generate_count(numbers):
last_index = -1
result = defaultdict(int)
cur_index = last_index + 1
tail_index = cur_index + 1
while cur_index < len(numbers):
while cur_index < tail_index and cur_index < len(numbers) and tail_index < len(numbers):
if numbers[cur_index] == numbers[tail_index]:
pre = cur_index
cur_index = tail_index
tail_index += (tail_index - pre)
else:
tail_index = (cur_index+tail_index) / 2
result[numbers[cur_index]] = cur_index - last_index
last_index = cur_index
cur_index = last_index + 1
tail_index = cur_index + 1
return result
if __name__ == "__main__":
numbers = []
for i in range(random.randint(1,20)):
numbers.append(1)
print 'number of 1 is', i+1
for i in range(random.randint(1, 20)):
numbers.append(2)
print 'number of 2 is', i+1
for i in range(random.randint(1, 20)):

Solution

I want to find a solution [whose] algorithm time complexity is lower than O(n).

You'll have to be more specific. What you asked for literally isn't possible; but sometimes people use "big-O" notation loosely, either out of laziness or because they haven't yet been exposed to the notation expressing their intended precise gradations of meaning.

Notice that if I give you the input

1 2 3 4 5 6 7 8 9 10 11 12 13 ...

then you must output

number of 1 is 1
number of 2 is 1
number of 3 is 1
number of 4 is 1
...

— that is, if the size of the input is n, then the worst-case size of the output is O(n). Printing those O(n) bytes of output will take O(n) time just by itself. Anything else the program does (besides print its output) can only make the overall program slower, not faster.

On the above worst-case input, your "repeated binary search" algorithm runs in O(n log n) time, which is worse than O(n).

Now, if you're not concerned about worst-case inputs, but merely with the average input... well, then you'd have to define a notion of "the average input". You'll have to come up with some model of what kinds of input you expect. One way to do this is to come up with an algorithm for generating inputs, and then treat it as your model. For example:

def get_random_input(n):
    return sorted([random.randrange(n) for i in xrange(n)])

Or:

def get_random_input(n):
    return sorted([random.randrange(100) for i in xrange(n)])

On these two different models, your algorithm performs very differently. On the latter, you do well: O(log n). On the former, you do poorly: O(n log n).

This is kind of like asking for a compression algorithm that compresses all possible inputs. You can get compression for some inputs only at the expense of others; the art is deciding which kinds of inputs are "interesting" and which can safely be blown up. In this case, you can get sub-O(n) performance for some inputs only at the expense of others; the art is deciding which kinds of inputs are "interesting", which is to say, deciding on a model for your "average" or "expected" input.

EDIT: My analogy above is flawed, as indicated by LinMa's comments below this answer. See, if you can get O(log n) performance on your "expected" inputs while blowing up the "unexpected" inputs by only a constant factor (e.g. "O(n) plus a constant number of comparisons" or "O(n) times two"), then you can keep your algorithm's worst-case big-O "constant" while improving the best-case.

Hope this helps!

Code Snippets

1 2 3 4 5 6 7 8 9 10 11 12 13 ...

number of 1 is 1
number of 2 is 1
number of 3 is 1
number of 4 is 1
...

def get_random_input(n):
    return sorted([random.randrange(n) for i in xrange(n)])

def get_random_input(n):
    return sorted([random.randrange(100) for i in xrange(n)])

Context

StackExchange Code Review Q#155808, answer score: 6

Revisions (0)

No revisions yet.