HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Finding the $k$th largest element in an evolving query data structure

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thequerylargestelementstructurefindingevolvingdata

Problem

Basically, the problem I am solving is this. Initially, the array $A$ is empty. Then I am given data to fill the array and at any time I have to make a query to print the $|A|/3$-th largest element inserted so far.

I was solving the problem with segment trees, but I am not able to make a little modification to the query function of the segment tree. The query function that I wrote returns the largest element between indices $a_{\text{begin}}$ and $a_{\text{end}}$:

int query(int Nodenumber,int t_begin,int t_end,int a_begin,int a_end)
{
    if (t_begin>=a_begin && t_end=a_begin && t_begin=a_begin && mid+1<=a_end)
            res = max(res,query(2*Nodenumber+1,mid+1,t_end,a_begin,a_end));

        return res;
    }
}


Note to make a query, I call the query function as query(1,0,N-1,QA,QB).

But I want to return the $|A|/3$-th largest element between indices $a_{\text{begin}}$ and $a_{\text{end}}$. So how should I modify the function to do this?

So updating, queries, updating, queries, updating, queries and so on are done randomly and several (upto $10^5$) times.

So, for solving the problem, did I pick the right data structure? I thought of using heaps, but that will be too slow, as I would have to pop $|A|/3$ elements from the top and reinsert them for every query.

Solution

Have a look at the Wikipedia page for segment trees:


It is, in principle, a static structure; that is, its content cannot be modified once the structure is built.

So it does not seem to be a good data structure for your scenario at all.

Note that the kind of query you want to perform is called selection (unimaginative, I know). Check the linked article for loads of algorithms. One simple way to solve your problem is this:

  • Store elements in a sorted list. That means, updates (that is insertions) take linear time.



  • For the queries, run a selection algorithm on the specified interval.



This approach can be improved by extending the list to a skip list.

Another approach is using binary search trees. If you use a self-balancing variant, that makes updates possible in $O(\log n)$ time. The queries you want can be implemented by storing in every node how many descendants it has, so we can find the $k$-th largest element in $O(\log n)$ time.

Context

StackExchange Computer Science Q#2575, answer score: 5

Revisions (0)

No revisions yet.