patternMinor
Transitive reduction of rectangle containment hierarchy DAG
Viewed 0 times
containmentrectanglehierarchytransitivereductiondag
Problem
I am looking for a $O(|V| + |E|)$ algorithm for finding transitive reduction of a rectangle containment hierarchy DAG, i.e. a directed edge exists from one rectangle to another if the first rectangle contains the second.
That is, remove as many edges as possible so that if you could reach v from u, for arbitrary v and u, you can still reach it after removal of edges.
Assume that rectangles are unique, so we are dealing with a DAG.
This is useful, for example, with quantitative association rule mining from Srikant/Agrawal 1996. There, we are interested in close ancestors for hyperrectangles s.t. ancestors contain us. Determining close ancestor is similar to determining close descendant, which is related to transitive reduction with rectangle containment. This kind of rule mining is related to APRIORI algorithm for standard (i.e. "Boolean") association rule mining from Agrawal/Srikant 1994.
A similar problem is transitive reduction of standard DAG, which is here:
Transitive reduction of DAG
References
That is, remove as many edges as possible so that if you could reach v from u, for arbitrary v and u, you can still reach it after removal of edges.
Assume that rectangles are unique, so we are dealing with a DAG.
This is useful, for example, with quantitative association rule mining from Srikant/Agrawal 1996. There, we are interested in close ancestors for hyperrectangles s.t. ancestors contain us. Determining close ancestor is similar to determining close descendant, which is related to transitive reduction with rectangle containment. This kind of rule mining is related to APRIORI algorithm for standard (i.e. "Boolean") association rule mining from Agrawal/Srikant 1994.
A similar problem is transitive reduction of standard DAG, which is here:
Transitive reduction of DAG
References
- Agrawal, Srikant - Fast algorithms for mining association rules (1994)
- Srikant, Agrawal - Mining quantitative association rules in large relational tables (1996)
Solution
Dec. 14, 2018
Standard approach and a heuristic alternative
Standard approach -- use graph and distance product
We note that transitive reduction is reduceable to transitive closure and vice versa both in additional time in $O(n ^ 2)$. Then, we know that if we can solve transitive closure in $O(n ^ 2 \cdot \textrm{polylog}(n))$ time, then we can also solve lax Boolean matrix multiplication (MM) in similar time by reducing it to transitive closure (or transitive reduction), according to Fischer/Meyer 1971. Lax Boolean MM is where input values are in $\{0, 1\}$ and we have semi-ring of $(\textrm{OR}, \textrm{AND})$. We can solve lax Boolean MM by reducing to standard (plus, times) MM. This means that significantly-subcubic-time approach is currently unlikely for transitive reduction. We can reduce node-based DAG transitive reduction to rectangle-containment-based DAG transitive reduction via topological ordering and shapes that have size inversely related to distance from source. For rectangle application, we should of course not have cycles, which happen when the rectangles are identical, in which case we should remove duplicates. A more direct algorithm is for computing reduction assuming sparse graph, which takes $O(n ^ 3)$ time. These times are also ignoring that to come up with initial DAG we ought to perform $n ^ 2$ containment tests, each of which takes time in $O(d) = O(2) = O(1)$.
Heuristic alternative -- use R-tree
Alternatively, assuming we have dimension $d$ moderately low and fixed and not bound tightly by $O(n)$, we can arrive at times that appear to be subcubic in $n$. We have two options -- both use an R-tree variant with bulk-loading e.g. via sort-tile-recursive (STR) approach via Leutenegger 1997 and that is balanced and that is dynamic via bkd-tree-like occasional rebuilding (see Procopiuc 2003) to support inserts and deletes in $O(\log^2(n))$ time. See Guttman 1984 for details on standard R-tree. The main idea is that we find close descendants for each of $n$ provided rectangles via a total of $n$ close-descendant queries. We consider both options to involve heuristics because we may have many intermediate hunches for close-descendant that do not ultimately survive, even though subqueries have good guaranteed time for option one.
If we have pair-wise rectangles that do not frequently involve enclosure/containment and overlap is highly correlated with enclosure/containment, then it is conceivable that we can get better performance using R-tree via either of the two options for our approach. We note that a rectangle enclosure query asks for given a query rectangle rectangles that contain it and a rectangle containment query asks for given a query rectangle rectangles that fit in it. Close-descendant query roughly works via a heuristic approach. We find primitive rectangles contained by query rectangle and use auxiliary queries to see if returned rectangles have parents that are contained by original query rectangle. The candidate close descendants we store in a "conflict" secondary R-tree for purpose of speeding up checks in form of enclosure subquery with early-stopping that determine whether a candidate parent encloses (and disqualifies) a candidate close descendant; we maintain this conflict tree via inserts and deletes. We note that when we encounter rectangles of same shape, we may choose to keep one of them arbitrarily. We use a best-first priority queue to guide bounding box consideration order; we prefer contained bounding boxes and we tie-break by preferring larger bounding box area (because it is harder to be a middle-man for a larger contained box).
Option #1 -- use look-ahead
The first option is to use look-ahead for subqueries such as rectangle enclosure and rectangle containment in conjunction with corner transformation. Close-descendant query would then not directly use look-ahead; its subqueries do. Then, we have a coefficient of $d$ for look-ahead-using queries for edge checks. We won't go into much detail about look-ahead except that it requires roughly disjoint bounding boxes for siblings (though shared edge is allowed) and that it is related to knowing that one child out of two subtrees definitely has a match (which is made more straightforward to notice assuming we also use corner transformation) via $d$ edge checks at each node. It should be noted that without look-ahead enclosure and containment already take at least $d$ time for each node for an R-tree. We modify enclosure subquery to use early-stopping, which leads time for such a query to be $O(\log(n))$ instead of $O(\log^2(n))$. Time for each close-descendant query is in $O(k \cdot \log^2(n))$, where $k$ is number of hunch close-descendants s.t. $k$ is in $O(n)$. This means for $n$ close-descendant queries overall we take time in $O(k \cdot n \cdot \log^2(n))$ = $O(n ^ 2 \cdot \log^2(n))$. Since this is less than cubic in $n$ (though we omit a factor of $d$ assuming it is moderately low and fixed), this ap
Standard approach and a heuristic alternative
Standard approach -- use graph and distance product
We note that transitive reduction is reduceable to transitive closure and vice versa both in additional time in $O(n ^ 2)$. Then, we know that if we can solve transitive closure in $O(n ^ 2 \cdot \textrm{polylog}(n))$ time, then we can also solve lax Boolean matrix multiplication (MM) in similar time by reducing it to transitive closure (or transitive reduction), according to Fischer/Meyer 1971. Lax Boolean MM is where input values are in $\{0, 1\}$ and we have semi-ring of $(\textrm{OR}, \textrm{AND})$. We can solve lax Boolean MM by reducing to standard (plus, times) MM. This means that significantly-subcubic-time approach is currently unlikely for transitive reduction. We can reduce node-based DAG transitive reduction to rectangle-containment-based DAG transitive reduction via topological ordering and shapes that have size inversely related to distance from source. For rectangle application, we should of course not have cycles, which happen when the rectangles are identical, in which case we should remove duplicates. A more direct algorithm is for computing reduction assuming sparse graph, which takes $O(n ^ 3)$ time. These times are also ignoring that to come up with initial DAG we ought to perform $n ^ 2$ containment tests, each of which takes time in $O(d) = O(2) = O(1)$.
Heuristic alternative -- use R-tree
Alternatively, assuming we have dimension $d$ moderately low and fixed and not bound tightly by $O(n)$, we can arrive at times that appear to be subcubic in $n$. We have two options -- both use an R-tree variant with bulk-loading e.g. via sort-tile-recursive (STR) approach via Leutenegger 1997 and that is balanced and that is dynamic via bkd-tree-like occasional rebuilding (see Procopiuc 2003) to support inserts and deletes in $O(\log^2(n))$ time. See Guttman 1984 for details on standard R-tree. The main idea is that we find close descendants for each of $n$ provided rectangles via a total of $n$ close-descendant queries. We consider both options to involve heuristics because we may have many intermediate hunches for close-descendant that do not ultimately survive, even though subqueries have good guaranteed time for option one.
If we have pair-wise rectangles that do not frequently involve enclosure/containment and overlap is highly correlated with enclosure/containment, then it is conceivable that we can get better performance using R-tree via either of the two options for our approach. We note that a rectangle enclosure query asks for given a query rectangle rectangles that contain it and a rectangle containment query asks for given a query rectangle rectangles that fit in it. Close-descendant query roughly works via a heuristic approach. We find primitive rectangles contained by query rectangle and use auxiliary queries to see if returned rectangles have parents that are contained by original query rectangle. The candidate close descendants we store in a "conflict" secondary R-tree for purpose of speeding up checks in form of enclosure subquery with early-stopping that determine whether a candidate parent encloses (and disqualifies) a candidate close descendant; we maintain this conflict tree via inserts and deletes. We note that when we encounter rectangles of same shape, we may choose to keep one of them arbitrarily. We use a best-first priority queue to guide bounding box consideration order; we prefer contained bounding boxes and we tie-break by preferring larger bounding box area (because it is harder to be a middle-man for a larger contained box).
Option #1 -- use look-ahead
The first option is to use look-ahead for subqueries such as rectangle enclosure and rectangle containment in conjunction with corner transformation. Close-descendant query would then not directly use look-ahead; its subqueries do. Then, we have a coefficient of $d$ for look-ahead-using queries for edge checks. We won't go into much detail about look-ahead except that it requires roughly disjoint bounding boxes for siblings (though shared edge is allowed) and that it is related to knowing that one child out of two subtrees definitely has a match (which is made more straightforward to notice assuming we also use corner transformation) via $d$ edge checks at each node. It should be noted that without look-ahead enclosure and containment already take at least $d$ time for each node for an R-tree. We modify enclosure subquery to use early-stopping, which leads time for such a query to be $O(\log(n))$ instead of $O(\log^2(n))$. Time for each close-descendant query is in $O(k \cdot \log^2(n))$, where $k$ is number of hunch close-descendants s.t. $k$ is in $O(n)$. This means for $n$ close-descendant queries overall we take time in $O(k \cdot n \cdot \log^2(n))$ = $O(n ^ 2 \cdot \log^2(n))$. Since this is less than cubic in $n$ (though we omit a factor of $d$ assuming it is moderately low and fixed), this ap
Context
StackExchange Computer Science Q#65689, answer score: 5
Revisions (0)
No revisions yet.