patternMinor
Data structure for map on intervals
Viewed 0 times
mapintervalsforstructuredata
Problem
Let $n$ be an integer, and let $\mathbb{Z}$ denote the set of all integers. Let $[a,b]$ denote the interval of integers $\{a,a+1,a+2,\dots,b\}$.
I am looking for a data structure to represent a map $f:[1,n] \to \mathbb{Z}$. I want the data structure to support the following operations:
-
$\text{get}(i)$ should return $f(i)$.
-
$\text{set}([a,b],y)$ should update $f$ so that $f(a)=f(a+1)=\cdots=f(b)=y$, i.e., update $f$ to a new map $f'$ such that $f'(i) = y$ for $i \in [a,b]$ and $f'(i) = f(i)$ for $i \notin [a,b]$.
-
$\text{stab}(i)$ should return the largest interval $[a,b]$ such that $i \in [a,b]$ and $f$ is constant on $[a,b]$ (i.e., $f(a)=f(a+1)=\cdots=f(b)$).
-
$\text{add}([a,b],\delta)$ should update $f$ to a new map $f'$ such that $f'(i) = f(i) + \delta$ for $i \in [a,b]$ and $f'(i) = f(i)$ for $i \notin [a,b]$.
I want each of these operations to be efficient. I would count $O(1)$ or $O(\lg n)$ time as efficient, but $O(n)$ time is too slow. It's OK if the running times are amortized running times. Is there a data structure that simultaneously makes all of these operations efficient?
(I've noticed a similar pattern come up in a several programming challenges. This is a generalization that would suffice for all of those challenge problems.)
I am looking for a data structure to represent a map $f:[1,n] \to \mathbb{Z}$. I want the data structure to support the following operations:
-
$\text{get}(i)$ should return $f(i)$.
-
$\text{set}([a,b],y)$ should update $f$ so that $f(a)=f(a+1)=\cdots=f(b)=y$, i.e., update $f$ to a new map $f'$ such that $f'(i) = y$ for $i \in [a,b]$ and $f'(i) = f(i)$ for $i \notin [a,b]$.
-
$\text{stab}(i)$ should return the largest interval $[a,b]$ such that $i \in [a,b]$ and $f$ is constant on $[a,b]$ (i.e., $f(a)=f(a+1)=\cdots=f(b)$).
-
$\text{add}([a,b],\delta)$ should update $f$ to a new map $f'$ such that $f'(i) = f(i) + \delta$ for $i \in [a,b]$ and $f'(i) = f(i)$ for $i \notin [a,b]$.
I want each of these operations to be efficient. I would count $O(1)$ or $O(\lg n)$ time as efficient, but $O(n)$ time is too slow. It's OK if the running times are amortized running times. Is there a data structure that simultaneously makes all of these operations efficient?
(I've noticed a similar pattern come up in a several programming challenges. This is a generalization that would suffice for all of those challenge problems.)
Solution
I believe logarithmic time for all queries is achievable. The main idea is to use an interval tree, where each node in the tree corresponds to an interval of indices. I'll build up the key ideas by starting with a simpler version of the data structure (that can support get and set but not the other operations), then add features to support the other features as well.
A simple scheme (supports get and set, but not add or stab)
Say that an interval $[a,b]$ is flat if the function $f$ is constant on $[a,b]$, i.e., if $f(a)=f(a+1)=\cdots =f(b)$.
Our simple data structure will be an interval tree. In other words, we have a binary tree, where each node corresponds to an interval (of indices). We'll store the corresponding interval $I(v)$ in each node $v$ of the tree. Each leaf will correspond to a flat interval, and they'll be arranged so that reading out the leaves left-to-right gives us a sequence of consecutive flat intervals which are disjoint and whose union is all of $[1,n]$. The interval for an internal node will be the union of the intervals of its two children. Also, in each leaf node $\ell$ we will store the value $V(\ell)$ of the function $f$ on the interval $I(\ell)$ corresponding to this node (note that this interval is flat, so $f$ is constant on the interval, so we just store a single value of $f$ in each leaf node).
Equivalently, you can imagine that we partition $[1,n]$ into flat intervals, and then the data structure is a binary search tree where the keys are the left endpoints of those intervals. The leaves contain the value of $f$ at some range of indices where $f$ is constant.
Use standard methods to ensure that the binary tree remains balanced, i.e., its depth is $O(\lg m)$ (where $m$ counts the current number of leaves in the tree). Of course, $m\le n$, so the depth is always at most $O(\lg n)$. This will be helpful below.
We can now support the get and set operations as follows:
-
$\text{get}(i)$ is easy: we traverse the tree to find the leaf whose interval contains $i$. This is basically just traversing a binary search tree. Since the depth is $O(\lg n)$, the running time is $O(\lg n)$.
-
$\text{set}([a,b],y)$ is trickier. It works like this:
-
First, we find the leaf interval $[a_0,b_0]$ containing $a$; if $a_0
-
Next, we find the leaf interval $[a_1,b_1]$ containing $b$; if $b
-
At this point, I claim that the interval $[a,b]$ can be expressed as the disjoint union of $O(\lg n)$ intervals corresponding to some subset of $O(\lg n)$ nodes in the tree. So, delete all the descendants of those nodes (turning them into leaves) and set the value stored in those nodes to $y$.
-
Finally, since we modified the shape of the tree, we'll perform any necessary rotations to re-balance the tree (using any standard technique for keeping a tree balanced).
Since this operation involves a few simple operations on $O(\lg n)$ nodes (and that set of nodes can be easily found in $O(\lg n)$ time), the total time for this operation is $O(\lg n)$.
This shows that we can support both the get and set operations in $O(\lg n)$ time per operation. In fact, the running time can be shown to be $O(\lg \min(n,s))$, where $s$ is the number of set operations performed up until now.
Adding support for add
We can modify the above data structure so that it can also support the add operation. In particular, instead of storing the value of the function in the leaves, it'll be represented as the sum of numbers stored in a set of nodes.
More precisely, the value $f(i)$ of the function at input $i$ will be recoverable as the sum of the values stored in the nodes on the path from the root of the tree down to the leaf whose interval contains $i$. In each node $v$ we'll store a value $V(v)$; if $v_0,v_1,\dots,v_k$ represent the ancestors of a leaf $v_k$ (including the leaf itself), then the value of function at $I(v_k)$ will be $V(v_0)+\dots + V(v_k)$.
It is easy to support the get and set operations using a variant of the techniques described above. Basically, as we traverse the tree downward, we keep track of the running sum of values, so that for each node $x$ that the traversal visit, we'll know the sum of values of the nodes on the path from the root to $x$. Once we do that, simple adjustments to the implementation of get and set described above will suffice.
And now we can support $\text{add}([a,b],\delta)$ efficiently. First, we express the interval $[a,b]$ as the union of $O(\lg n)$ intervals corresponding to some set of $O(\lg n)$ nodes in the tree (splitting a node at the left endpoint and right endpoint if needed), exactly as done in steps 1-3 of the set operation. Now, we simply add $\delta$ to the value stored in each of those $O(\lg n)$ nodes. (We don't delete their descendants.)
This provides a way to support get, set, and add, in $O(\lg n)$ time per operation. In fact, the running time per operation is $O(\lg \min(n,s))$ where $s$ counts the number of set operations pl
A simple scheme (supports get and set, but not add or stab)
Say that an interval $[a,b]$ is flat if the function $f$ is constant on $[a,b]$, i.e., if $f(a)=f(a+1)=\cdots =f(b)$.
Our simple data structure will be an interval tree. In other words, we have a binary tree, where each node corresponds to an interval (of indices). We'll store the corresponding interval $I(v)$ in each node $v$ of the tree. Each leaf will correspond to a flat interval, and they'll be arranged so that reading out the leaves left-to-right gives us a sequence of consecutive flat intervals which are disjoint and whose union is all of $[1,n]$. The interval for an internal node will be the union of the intervals of its two children. Also, in each leaf node $\ell$ we will store the value $V(\ell)$ of the function $f$ on the interval $I(\ell)$ corresponding to this node (note that this interval is flat, so $f$ is constant on the interval, so we just store a single value of $f$ in each leaf node).
Equivalently, you can imagine that we partition $[1,n]$ into flat intervals, and then the data structure is a binary search tree where the keys are the left endpoints of those intervals. The leaves contain the value of $f$ at some range of indices where $f$ is constant.
Use standard methods to ensure that the binary tree remains balanced, i.e., its depth is $O(\lg m)$ (where $m$ counts the current number of leaves in the tree). Of course, $m\le n$, so the depth is always at most $O(\lg n)$. This will be helpful below.
We can now support the get and set operations as follows:
-
$\text{get}(i)$ is easy: we traverse the tree to find the leaf whose interval contains $i$. This is basically just traversing a binary search tree. Since the depth is $O(\lg n)$, the running time is $O(\lg n)$.
-
$\text{set}([a,b],y)$ is trickier. It works like this:
-
First, we find the leaf interval $[a_0,b_0]$ containing $a$; if $a_0
-
Next, we find the leaf interval $[a_1,b_1]$ containing $b$; if $b
-
At this point, I claim that the interval $[a,b]$ can be expressed as the disjoint union of $O(\lg n)$ intervals corresponding to some subset of $O(\lg n)$ nodes in the tree. So, delete all the descendants of those nodes (turning them into leaves) and set the value stored in those nodes to $y$.
-
Finally, since we modified the shape of the tree, we'll perform any necessary rotations to re-balance the tree (using any standard technique for keeping a tree balanced).
Since this operation involves a few simple operations on $O(\lg n)$ nodes (and that set of nodes can be easily found in $O(\lg n)$ time), the total time for this operation is $O(\lg n)$.
This shows that we can support both the get and set operations in $O(\lg n)$ time per operation. In fact, the running time can be shown to be $O(\lg \min(n,s))$, where $s$ is the number of set operations performed up until now.
Adding support for add
We can modify the above data structure so that it can also support the add operation. In particular, instead of storing the value of the function in the leaves, it'll be represented as the sum of numbers stored in a set of nodes.
More precisely, the value $f(i)$ of the function at input $i$ will be recoverable as the sum of the values stored in the nodes on the path from the root of the tree down to the leaf whose interval contains $i$. In each node $v$ we'll store a value $V(v)$; if $v_0,v_1,\dots,v_k$ represent the ancestors of a leaf $v_k$ (including the leaf itself), then the value of function at $I(v_k)$ will be $V(v_0)+\dots + V(v_k)$.
It is easy to support the get and set operations using a variant of the techniques described above. Basically, as we traverse the tree downward, we keep track of the running sum of values, so that for each node $x$ that the traversal visit, we'll know the sum of values of the nodes on the path from the root to $x$. Once we do that, simple adjustments to the implementation of get and set described above will suffice.
And now we can support $\text{add}([a,b],\delta)$ efficiently. First, we express the interval $[a,b]$ as the union of $O(\lg n)$ intervals corresponding to some set of $O(\lg n)$ nodes in the tree (splitting a node at the left endpoint and right endpoint if needed), exactly as done in steps 1-3 of the set operation. Now, we simply add $\delta$ to the value stored in each of those $O(\lg n)$ nodes. (We don't delete their descendants.)
This provides a way to support get, set, and add, in $O(\lg n)$ time per operation. In fact, the running time per operation is $O(\lg \min(n,s))$ where $s$ counts the number of set operations pl
Context
StackExchange Computer Science Q#18542, answer score: 4
Revisions (0)
No revisions yet.