patternMinor
Datastructure for insertion and scored extraction
Viewed 0 times
insertionextractionscoreddatastructureforand
Problem
I have a simple program (assume a shop) that is actually just returning the $m$ most active customers from a total list of $n$ users (so $m \leq n$). I only use this program on one machine with one process, so assume no distribution of data to be necessary.
I require a data structure storing $\mathrm{userId}$ and $\mathrm{score}$, where $\mathrm{score}$ is an unsigned integer. In the beginning it's $\mathrm{score}:=0$ for all $n$ users. I can
I have an
At the moment my prototype uses a hashmap for the scoring and a linear min-search, so complexity for
What data structure should I use? Is there such a data structure for the operations
I require a data structure storing $\mathrm{userId}$ and $\mathrm{score}$, where $\mathrm{score}$ is an unsigned integer. In the beginning it's $\mathrm{score}:=0$ for all $n$ users. I can
extract $m$ users beginning with the highest $\mathrm{score}$ (filling up with any other user, so the first extract just returns any $m$ users).I have an
increment(x) function that is called often. It increases $\mathrm{score}$ of the user with $\mathrm{userId} = x$ by $1$. Afterwards I want to call extract again, where the result may differ by $1$ user (the last incremented score may lead to that $\mathrm{userId}$ replacing another $\mathrm{userId}$ within the $n$ members).At the moment my prototype uses a hashmap for the scoring and a linear min-search, so complexity for
increment is $O(1)$ and for extract is $O(n\cdot m)$. I feel like there should exist a data structure where extract is $O(m)$ (min-heap maybe?) while increment is either $O(\log n )$ or even $O(1)$. Assuming $m = \log n$, this would lead to a speedup of $n$ which would be immense even for small datasets.What data structure should I use? Is there such a data structure for the operations
extract and increment(x)? And if not, is there a theoretical proof that there is none?Solution
You can use left threaded AVL trees with score as key and hash (expected $\mathcal O(1)$ insert/delete or simple AVL, logarithmic but with value much smaller than $n$) as values to keep id's, and all the time keep the data sorted with $\mathcal O(1)$ inorder traversal and $\mathcal O(\log n)$ insert (which in case of your increment operation is one delete, one insert). Also you might want to keep the last result (which is sorted) to check whether some new id appeared and use binary search if it did.
Overall this gives $\mathcal O(m)$ extract (which cannot be better) and two $\mathcal O(\log n)$ operations per increment.
Overall this gives $\mathcal O(m)$ extract (which cannot be better) and two $\mathcal O(\log n)$ operations per increment.
Context
StackExchange Computer Science Q#77608, answer score: 3
Revisions (0)
No revisions yet.