HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Datastructure for insertion and scored extraction

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
insertionextractionscoreddatastructureforand

Problem

I have a simple program (assume a shop) that is actually just returning the $m$ most active customers from a total list of $n$ users (so $m \leq n$). I only use this program on one machine with one process, so assume no distribution of data to be necessary.

I require a data structure storing $\mathrm{userId}$ and $\mathrm{score}$, where $\mathrm{score}$ is an unsigned integer. In the beginning it's $\mathrm{score}:=0$ for all $n$ users. I can extract $m$ users beginning with the highest $\mathrm{score}$ (filling up with any other user, so the first extract just returns any $m$ users).

I have an increment(x) function that is called often. It increases $\mathrm{score}$ of the user with $\mathrm{userId} = x$ by $1$. Afterwards I want to call extract again, where the result may differ by $1$ user (the last incremented score may lead to that $\mathrm{userId}$ replacing another $\mathrm{userId}$ within the $n$ members).

At the moment my prototype uses a hashmap for the scoring and a linear min-search, so complexity for increment is $O(1)$ and for extract is $O(n\cdot m)$. I feel like there should exist a data structure where extract is $O(m)$ (min-heap maybe?) while increment is either $O(\log n )$ or even $O(1)$. Assuming $m = \log n$, this would lead to a speedup of $n$ which would be immense even for small datasets.

What data structure should I use? Is there such a data structure for the operations extract and increment(x)? And if not, is there a theoretical proof that there is none?

Solution

You can use left threaded AVL trees with score as key and hash (expected $\mathcal O(1)$ insert/delete or simple AVL, logarithmic but with value much smaller than $n$) as values to keep id's, and all the time keep the data sorted with $\mathcal O(1)$ inorder traversal and $\mathcal O(\log n)$ insert (which in case of your increment operation is one delete, one insert). Also you might want to keep the last result (which is sorted) to check whether some new id appeared and use binary search if it did.

Overall this gives $\mathcal O(m)$ extract (which cannot be better) and two $\mathcal O(\log n)$ operations per increment.

Context

StackExchange Computer Science Q#77608, answer score: 3

Revisions (0)

No revisions yet.