patternMinor
Weighted closest-pair-of-points problem
Viewed 0 times
problemweightedpointsclosestpair
Problem
I want to solve the following optimisation problem (an approximation or heuristic would be helpful as well).
I have two sets of points in the plane:
$P=\left\{ p_{1},p_{2},\dots,p_{N}\right\} $ and $Q=\left\{ q_{1},q_{2},\dots,q_{M}\right\} $.
Each point has a value/weight, i.e there is a function $v:P\cup Q\rightarrow\mathbb{R}$ which assigns a weight/value to each point.
I want to find a pair of points, one in $P$ and one in $Q$, which maximizes the function $f:P\times Q\rightarrow \mathbb{R} $,
$
f\left(p,q\right)=v\left(p\right)+v\left(q\right)-\alpha\cdot d\left(p,q\right)
$
where $d$ is the Manhattan distance between the points.
So on one hand I want points with a high value, on the other hand I want them to be close together. $\alpha$ is just a constant which I will determine according to the specific application and it assigns a "relative importance" of the distance relative to the weights.
The choice for Manhattan distance is because I want to use this with actual geographical points which are connected by actual road networks.
Does anyone have a suggestion how to solve this without an exhaustive search over all pairs? even an approximation or heuristic suggestion would be helpful.
Thank you!
I have two sets of points in the plane:
$P=\left\{ p_{1},p_{2},\dots,p_{N}\right\} $ and $Q=\left\{ q_{1},q_{2},\dots,q_{M}\right\} $.
Each point has a value/weight, i.e there is a function $v:P\cup Q\rightarrow\mathbb{R}$ which assigns a weight/value to each point.
I want to find a pair of points, one in $P$ and one in $Q$, which maximizes the function $f:P\times Q\rightarrow \mathbb{R} $,
$
f\left(p,q\right)=v\left(p\right)+v\left(q\right)-\alpha\cdot d\left(p,q\right)
$
where $d$ is the Manhattan distance between the points.
So on one hand I want points with a high value, on the other hand I want them to be close together. $\alpha$ is just a constant which I will determine according to the specific application and it assigns a "relative importance" of the distance relative to the weights.
The choice for Manhattan distance is because I want to use this with actual geographical points which are connected by actual road networks.
Does anyone have a suggestion how to solve this without an exhaustive search over all pairs? even an approximation or heuristic suggestion would be helpful.
Thank you!
Solution
Suggested solution: use a k-d tree
I recommend you use a k-d tree to store the points of $Q$, with $k=3$ (3 dimensions). The three dimensions of each point are the $x$-coordinate, the $y$-coordinate, and the $v$-value (the value/weight of the point).
Store all of the points of $Q$ in a k-d tree. Then given a point $p$, you can use the structure of the $k$-d tree to find the point $q \in Q$ that maximizes $f(p,q)$ in a relatively efficient way -- more efficient than naively testing each possible point $q \in Q$. Then, you can iterate through the points of $P$, and for each point $p \in P$, use this procedure to find the point $q \in Q$ that maximizes $f(p,q)$. Keep track of the best pair you've seen, and this will solve your problem.
I guess I'd better describe how to use the k-d tree. Recall that each node in the k-d tree splits by a binary condition of the form $x \le c$ or $y \le c$ or $v \le c$, where $c$ is some constant (all points that satisfy the condition are stored in the left subtree of that node and all points that don't satisfy the condition are stored in the right subtree). Each point $q \in Q$ is stored somewhere in the tree. So, any node $s$ in the k-d tree corresponds to a subset $S_s \subseteq Q$ of points from $Q$, namely, the points that appear somewhere in the subtree rooted at $s$. Due to the splitting criteria, each set $S_s$ has the form $Q \cap R$ where $R$ is some axis-aligned 3-dimensional rectangle. For a particular node $s$, define
$$\begin{align*}
s.x_\text{max} &= \max\{q.x : q \in S_s\}\\
s.x_\text{min} &= \min\{q.x : q \in S_s\}\\
s.y_\text{max} &= \max\{q.y : q \in S_s\}\\
s.y_\text{min} &= \min\{q.y : q \in S_s\}\\
s.v_\text{max} &= \max\{v(q) : q \in S_s\}\\
s.v_\text{min} &= \min\{v(q) : q \in S_s\}
\end{align*}$$
As you form the k-d tree, precalculate the above 6 values for every node and store them in that node of the k-d tree, so they're available for ready lookup.
Define $R_s = [s.x_\text{min},s.x_\text{max}] \times [s.y_\text{min},s.y_\text{max}] \times [s.v_\text{min},s.v_\text{max}]$. We can see that $S_s = Q \cap R_s$. It will be helpful to define
$$d(p,R_s) = \min \{d(p,r) : r \in R_s\}.$$
Note that you can efficiently calculate $d(p,R_s)$, given the point $p$ and the 6 values associated with $s$. (Basically, this involves a case analysis through 27 cases, depending upon where $p$ is relative to $R_s$. The running time to compute $d(p,R_S)$ is $O(1)$.)
Let's fix a point $p$. I'm going to describe an algorithm to find the point $q \in Q$ that maximizes $f(p,q)$, by recursively searching the k-d tree for $Q$ in a suitable way. The idea is that we'll recursively visit the nodes of the k-d tree, but pruning the traversal when we reach a node that cannot possibly hold any point $q \in Q$ that will be better than the best seen so far.
To help us prune the traversal, notice that we can bound the maximum possible value of $f(p,q)$ over all $q \in S_s$, as follows:
$$f(p,q) \le v(p) + s.v_\text{max} - \alpha \cdot d(p,R_s) \text{ for all } q \in S_s.$$
Thus, whenever our recursive traversal reaches a node $s$ of the k-d tree, we'll use the above bound to figure out whether there is any possibility that exploring the subtree rooted at $s$ could possibly yield some improvement over the best result seen so far. If not, we'll prune the traversal and won't visit any of the children or descendants of $s$.
Thus, the algorithm looks something like the following:
Given a k-d tree for $Q$ and a point $p \in P$, this algorithm computes $\max \{f(p,q) : q \in Q\}$. It will visit some of the nodes of the k-d tree but not all of them. So, iterate through the points $p \in P$ and call
The above is correct but not necessarily optimal in performance. As an optimization, I suggest that at nodes that split on the $x$-coordinate or $y$-coordinate, you should choose the order of the recursive traversal to prefer the subtree that contains $p$; and for nodes that split on $v$-value, always prefer the subtree with the larger $v$-value. In other words, I suggest you replace the last two lines of
This will choose a traversal ordering that is more likely to help you prune many unpr
I recommend you use a k-d tree to store the points of $Q$, with $k=3$ (3 dimensions). The three dimensions of each point are the $x$-coordinate, the $y$-coordinate, and the $v$-value (the value/weight of the point).
Store all of the points of $Q$ in a k-d tree. Then given a point $p$, you can use the structure of the $k$-d tree to find the point $q \in Q$ that maximizes $f(p,q)$ in a relatively efficient way -- more efficient than naively testing each possible point $q \in Q$. Then, you can iterate through the points of $P$, and for each point $p \in P$, use this procedure to find the point $q \in Q$ that maximizes $f(p,q)$. Keep track of the best pair you've seen, and this will solve your problem.
I guess I'd better describe how to use the k-d tree. Recall that each node in the k-d tree splits by a binary condition of the form $x \le c$ or $y \le c$ or $v \le c$, where $c$ is some constant (all points that satisfy the condition are stored in the left subtree of that node and all points that don't satisfy the condition are stored in the right subtree). Each point $q \in Q$ is stored somewhere in the tree. So, any node $s$ in the k-d tree corresponds to a subset $S_s \subseteq Q$ of points from $Q$, namely, the points that appear somewhere in the subtree rooted at $s$. Due to the splitting criteria, each set $S_s$ has the form $Q \cap R$ where $R$ is some axis-aligned 3-dimensional rectangle. For a particular node $s$, define
$$\begin{align*}
s.x_\text{max} &= \max\{q.x : q \in S_s\}\\
s.x_\text{min} &= \min\{q.x : q \in S_s\}\\
s.y_\text{max} &= \max\{q.y : q \in S_s\}\\
s.y_\text{min} &= \min\{q.y : q \in S_s\}\\
s.v_\text{max} &= \max\{v(q) : q \in S_s\}\\
s.v_\text{min} &= \min\{v(q) : q \in S_s\}
\end{align*}$$
As you form the k-d tree, precalculate the above 6 values for every node and store them in that node of the k-d tree, so they're available for ready lookup.
Define $R_s = [s.x_\text{min},s.x_\text{max}] \times [s.y_\text{min},s.y_\text{max}] \times [s.v_\text{min},s.v_\text{max}]$. We can see that $S_s = Q \cap R_s$. It will be helpful to define
$$d(p,R_s) = \min \{d(p,r) : r \in R_s\}.$$
Note that you can efficiently calculate $d(p,R_s)$, given the point $p$ and the 6 values associated with $s$. (Basically, this involves a case analysis through 27 cases, depending upon where $p$ is relative to $R_s$. The running time to compute $d(p,R_S)$ is $O(1)$.)
Let's fix a point $p$. I'm going to describe an algorithm to find the point $q \in Q$ that maximizes $f(p,q)$, by recursively searching the k-d tree for $Q$ in a suitable way. The idea is that we'll recursively visit the nodes of the k-d tree, but pruning the traversal when we reach a node that cannot possibly hold any point $q \in Q$ that will be better than the best seen so far.
To help us prune the traversal, notice that we can bound the maximum possible value of $f(p,q)$ over all $q \in S_s$, as follows:
$$f(p,q) \le v(p) + s.v_\text{max} - \alpha \cdot d(p,R_s) \text{ for all } q \in S_s.$$
Thus, whenever our recursive traversal reaches a node $s$ of the k-d tree, we'll use the above bound to figure out whether there is any possibility that exploring the subtree rooted at $s$ could possibly yield some improvement over the best result seen so far. If not, we'll prune the traversal and won't visit any of the children or descendants of $s$.
Thus, the algorithm looks something like the following:
def findbestpoint(p):
bestsofar := -infinity
visit(p, root of k-d tree for Q)
return bestsofar
def visit(p, s):
if v(p) + s.v_max - alpha * d(p,R_s) <= bestsofar:
return
update bestsofar based upon the point q stored in node s (if any)
if s is a leaf:
return
visit(p, s.rightchild)
visit(p, s.leftchild)Given a k-d tree for $Q$ and a point $p \in P$, this algorithm computes $\max \{f(p,q) : q \in Q\}$. It will visit some of the nodes of the k-d tree but not all of them. So, iterate through the points $p \in P$ and call
findbestpoint() on each one.The above is correct but not necessarily optimal in performance. As an optimization, I suggest that at nodes that split on the $x$-coordinate or $y$-coordinate, you should choose the order of the recursive traversal to prefer the subtree that contains $p$; and for nodes that split on $v$-value, always prefer the subtree with the larger $v$-value. In other words, I suggest you replace the last two lines of
visit(p, s) withif the condition associated with s is a split on the v-value:
visit(p, s.rightchild)
visit(p, s.leftchild)
else if p satisfies the condition associated with s (i.e., p is contained within the subtree rooted at s.leftchild):
visit(p, s.leftchild)
visit(p, s.rightchild)
else:
visit(p, s.rightchild)
visit(p, s.leftchild)This will choose a traversal ordering that is more likely to help you prune many unpr
Code Snippets
def findbestpoint(p):
bestsofar := -infinity
visit(p, root of k-d tree for Q)
return bestsofar
def visit(p, s):
if v(p) + s.v_max - alpha * d(p,R_s) <= bestsofar:
return
update bestsofar based upon the point q stored in node s (if any)
if s is a leaf:
return
visit(p, s.rightchild)
visit(p, s.leftchild)if the condition associated with s is a split on the v-value:
visit(p, s.rightchild)
visit(p, s.leftchild)
else if p satisfies the condition associated with s (i.e., p is contained within the subtree rooted at s.leftchild):
visit(p, s.leftchild)
visit(p, s.rightchild)
else:
visit(p, s.rightchild)
visit(p, s.leftchild)def findbestpair():
bestsofar := -infinity
visitpair(root of k-d tree for P, root of k-d tree for Q)
return bestsofar
def visitpair(s, t):
if s.v_max + t.v_max - alpha * d(R_s,R_t) <= bestsofar:
return
if s is a leaf and t is a leaf:
let p := the point contained within s
let q := the point contained within t
if f(p,q) > bestsofar:
bestsofar := f(p,q)
else if s is a leaf:
visit(s, t.rightchild)
visit(s, t.leftchild)
else if t is a leaf:
visit(s.rightchild, t)
visit(s.leftchild, t)
else:
visit(s.rightchild, t.rightchild)
visit(s.leftchild, t.leftchild)
visit(s.leftchild, t.rightchild)
visit(s.leftchild, t.leftchild)Context
StackExchange Computer Science Q#41080, answer score: 2
Revisions (0)
No revisions yet.