patternMinor
Data structure choice for a query-update-delete problem
Viewed 0 times
problemupdatedeletequerychoiceforstructuredata
Problem
Given is an initial set of n keys. Each key k is of the form (p, q). Note that both p and q are positive.
At any given point, there are two possible actions:
1) Query-Delete: Given a value s as query, find all such keys in the set that have have the (globally) minimum p and satisfy (q >= s), and delete these keys.
2) Insert: Add a new key k' = (u, v) to the set.
I was wondering what would be the best data structure to represent the above.
My first approach is to simply sort the original set in a linked list (or array) and return the appropriate tuples when queried, and if inserted, use linear search to find the position, but the worst-case insertion then becomes O(n), while query-deletion involves checking the first element, and has the complexity O(number of distinct q values).
A second approach is to use a red-black tree with the p values as its key, storing (p, q, r) in each node, where r is the biggest q value found at that level or below in the tree.
For example, given a set (10, 4), (20, 2), (30, 2), (40, 3) the tree looks like:
where each node is of the form (p, q, r).
This approach has the benefit of being able to early exit during the query search if (r < s) in the node. Insertion remains O(log n) like in red-black trees, but may face problems with sets of the form {(p, q1), (p, q2), ...}.
Could anyone please suggest modifications or propose a better approach? Thanks.
At any given point, there are two possible actions:
1) Query-Delete: Given a value s as query, find all such keys in the set that have have the (globally) minimum p and satisfy (q >= s), and delete these keys.
2) Insert: Add a new key k' = (u, v) to the set.
I was wondering what would be the best data structure to represent the above.
My first approach is to simply sort the original set in a linked list (or array) and return the appropriate tuples when queried, and if inserted, use linear search to find the position, but the worst-case insertion then becomes O(n), while query-deletion involves checking the first element, and has the complexity O(number of distinct q values).
A second approach is to use a red-black tree with the p values as its key, storing (p, q, r) in each node, where r is the biggest q value found at that level or below in the tree.
For example, given a set (10, 4), (20, 2), (30, 2), (40, 3) the tree looks like:
(20, 2, 4)
/ \
(10, 4, 4) (30, 2, 3)
\
(40, 3, 3)where each node is of the form (p, q, r).
This approach has the benefit of being able to early exit during the query search if (r < s) in the node. Insertion remains O(log n) like in red-black trees, but may face problems with sets of the form {(p, q1), (p, q2), ...}.
Could anyone please suggest modifications or propose a better approach? Thanks.
Solution
Store your data structure in a red-black tree where the first component of a key (i.e. p in (p,q)) is the key and the values are red-black trees of the second parts of keys with identical first part.
So, for the set $\{(a,b), (a,d), (d,b), (e,f), (e,g)\}$, you would store a tree with three keys (a, d, and e). The value for the key a would be a red-black tree with b and d, the for the key d would be a red-black tree with b, and the value for the key e would be a red-black tree with f and g.
A red-black tree can be split in $O(\lg n)$ time, so query-delete would take $O(\lg n) + O(\lg n) = O(\lg n)$ time - $O(\lg n)$ for the tree lookup and $O(\lg n)$ for the split of the tree. This doesn't include the cost of deleting, deconstructing, finalizing, or disposing of the deleted elements.
insert would also be $O(\lg n)$.
Query-delete can be made $O(1 + \lg k)$, where $k$ is the size of the deleted set, by keeping a pointer to the smallest key in the big tree and searching from the ends of the little tree, rather than from the root.
Amortized, query-delete is $O(1)$: using the banker's method, simply put a single on each item in the little trees. Since returning $k$ items takes $O(1 + \lg k)$ time and returns $k$ coins, its amortized cost is now $O(1)$, while insert's is $O(1 + \lg n) = O(\lg n)$.
So, for the set $\{(a,b), (a,d), (d,b), (e,f), (e,g)\}$, you would store a tree with three keys (a, d, and e). The value for the key a would be a red-black tree with b and d, the for the key d would be a red-black tree with b, and the value for the key e would be a red-black tree with f and g.
A red-black tree can be split in $O(\lg n)$ time, so query-delete would take $O(\lg n) + O(\lg n) = O(\lg n)$ time - $O(\lg n)$ for the tree lookup and $O(\lg n)$ for the split of the tree. This doesn't include the cost of deleting, deconstructing, finalizing, or disposing of the deleted elements.
insert would also be $O(\lg n)$.
Query-delete can be made $O(1 + \lg k)$, where $k$ is the size of the deleted set, by keeping a pointer to the smallest key in the big tree and searching from the ends of the little tree, rather than from the root.
Amortized, query-delete is $O(1)$: using the banker's method, simply put a single on each item in the little trees. Since returning $k$ items takes $O(1 + \lg k)$ time and returns $k$ coins, its amortized cost is now $O(1)$, while insert's is $O(1 + \lg n) = O(\lg n)$.
Context
StackExchange Computer Science Q#54779, answer score: 2
Revisions (0)
No revisions yet.