patternMinor
Using persistence on a constant database
Viewed 0 times
persistenceconstantdatabaseusing
Problem
I'm quite new to understanding persistent data structures, so bear with me.
Consider some database with data of the form $(\text{id}, d_1, d_2)$. The last two could e.g. mean production data and sales date, respectively. The database is constant, i.e. we will not be adding or removing data points.
Let $n$ be the number of items in the database. I want to be able to do the following query:
Query$(a,b,s)$: Return the number of objects that have $a\leq d_1 \leq b$ and $d_2 \leq s$.
I want to be able to perform the query in $\mathcal O(\log n)$ time, with the best possible space and preprocessing time complexities. My question is how could one construct a persistent data structure that solves this problem?
I think that I need to use timestamps that relate to the $d_1$ and/or $d_2$ in some way. However, I'm not quite sure how.
Consider some database with data of the form $(\text{id}, d_1, d_2)$. The last two could e.g. mean production data and sales date, respectively. The database is constant, i.e. we will not be adding or removing data points.
Let $n$ be the number of items in the database. I want to be able to do the following query:
Query$(a,b,s)$: Return the number of objects that have $a\leq d_1 \leq b$ and $d_2 \leq s$.
I want to be able to perform the query in $\mathcal O(\log n)$ time, with the best possible space and preprocessing time complexities. My question is how could one construct a persistent data structure that solves this problem?
I think that I need to use timestamps that relate to the $d_1$ and/or $d_2$ in some way. However, I'm not quite sure how.
Solution
Yes, you can do this in $O(\log n)$ time. Use a sweep-line algorithm together with a persistent tree data structure.
Imagine the point $(d_1,d_2)$ as being on a grid, with $d_1$ as the value of the $x$-coordinate and $d_2$ as the value of the $y$-coordinate. Your sweep line is a horizontal line, namely, at time $t$, the sweep line is the line $d_2=t$.
You will build a data structure that stores all the points $(d_1,d_t)$ such that $d_2 \le t$. You can use a balanced binary search tree, keyed on the value of $d_1$. This data structure will let you answer the query Query$(a,b,t)$ in $O(\log n)$ time.
Now you're going to gradually increase the value of $t$ (sweeping the line upward), and adjusting the data structure as you go. You'll save a copy of the data structure for each value of $t$ that occurs in the $d_2$-database.
This sounds like it requires $O(n^2)$ space. However, there's a clever trick. Use a persistent data structure to store the trees. As you move the sweep line upward, you insert more items into the tree. A persistent data structure will let you pick any time $t$ and do a lookup into the state of the tree at time $t$. This lets you answer Query$(a,b,t)$ in time $O(\lg n)$.
Imagine the point $(d_1,d_2)$ as being on a grid, with $d_1$ as the value of the $x$-coordinate and $d_2$ as the value of the $y$-coordinate. Your sweep line is a horizontal line, namely, at time $t$, the sweep line is the line $d_2=t$.
You will build a data structure that stores all the points $(d_1,d_t)$ such that $d_2 \le t$. You can use a balanced binary search tree, keyed on the value of $d_1$. This data structure will let you answer the query Query$(a,b,t)$ in $O(\log n)$ time.
Now you're going to gradually increase the value of $t$ (sweeping the line upward), and adjusting the data structure as you go. You'll save a copy of the data structure for each value of $t$ that occurs in the $d_2$-database.
This sounds like it requires $O(n^2)$ space. However, there's a clever trick. Use a persistent data structure to store the trees. As you move the sweep line upward, you insert more items into the tree. A persistent data structure will let you pick any time $t$ and do a lookup into the state of the tree at time $t$. This lets you answer Query$(a,b,t)$ in time $O(\lg n)$.
Context
StackExchange Computer Science Q#62561, answer score: 3
Revisions (0)
No revisions yet.