HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Data structure choice for a query-update-delete problem

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
problemupdatedeletequerychoiceforstructuredata

Problem

Given is an initial set of n keys. Each key k is of the form (p, q). Note that both p and q are positive.

At any given point, there are two possible actions:

1) Query-Delete: Given a value s as query, find all such keys in the set that have have the (globally) minimum p and satisfy (q >= s), and delete these keys.

2) Insert: Add a new key k' = (u, v) to the set.

I was wondering what would be the best data structure to represent the above.

My first approach is to simply sort the original set in a linked list (or array) and return the appropriate tuples when queried, and if inserted, use linear search to find the position, but the worst-case insertion then becomes O(n), while query-deletion involves checking the first element, and has the complexity O(number of distinct q values).

A second approach is to use a red-black tree with the p values as its key, storing (p, q, r) in each node, where r is the biggest q value found at that level or below in the tree.

For example, given a set (10, 4), (20, 2), (30, 2), (40, 3) the tree looks like:

(20, 2, 4)
       /          \
 (10, 4, 4)        (30, 2, 3)
                     \
                      (40, 3, 3)


where each node is of the form (p, q, r).

This approach has the benefit of being able to early exit during the query search if (r < s) in the node. Insertion remains O(log n) like in red-black trees, but may face problems with sets of the form {(p, q1), (p, q2), ...}.

Could anyone please suggest modifications or propose a better approach? Thanks.

Solution

Store your data structure in a red-black tree where the first component of a key (i.e. p in (p,q)) is the key and the values are red-black trees of the second parts of keys with identical first part.

So, for the set $\{(a,b), (a,d), (d,b), (e,f), (e,g)\}$, you would store a tree with three keys (a, d, and e). The value for the key a would be a red-black tree with b and d, the for the key d would be a red-black tree with b, and the value for the key e would be a red-black tree with f and g.

A red-black tree can be split in $O(\lg n)$ time, so query-delete would take $O(\lg n) + O(\lg n) = O(\lg n)$ time - $O(\lg n)$ for the tree lookup and $O(\lg n)$ for the split of the tree. This doesn't include the cost of deleting, deconstructing, finalizing, or disposing of the deleted elements.

insert would also be $O(\lg n)$.

Query-delete can be made $O(1 + \lg k)$, where $k$ is the size of the deleted set, by keeping a pointer to the smallest key in the big tree and searching from the ends of the little tree, rather than from the root.

Amortized, query-delete is $O(1)$: using the banker's method, simply put a single on each item in the little trees. Since returning $k$ items takes $O(1 + \lg k)$ time and returns $k$ coins, its amortized cost is now $O(1)$, while insert's is $O(1 + \lg n) = O(\lg n)$.

Context

StackExchange Computer Science Q#54779, answer score: 2

Revisions (0)

No revisions yet.