principleCritical
Why can I look at a graph and immediately find the closest point to another point, but it takes me O(n) time through programming?
Viewed 0 times
whycanthegraphpointthroughbutprogramminglookclosest
Problem
Let me clarify:
Given a scatterplot of some given number of points n, if I want to find the closest point to any point in the plot mentally, I can immediately ignore most points in the graph, narrowing my choices down to some small, constant number of points nearby.
Yet, in programming, given a set of points n, in order to find the closest point to any one, it requires checking every other point, which is ${\cal O}(n)$ time.
I am guessing that the visual sight of a graph is likely the equivalent of some data structure I am incapable of understanding; because with programming, by converting the points to a more structured method such as a quadtree, one can find the closest points to $k$ points in $n$ in $k\cdot\log(n)$ time, or ammortized ${\cal O}(\log n)$ time.
But there is still no known ${\cal O}(1)$ ammortized algorithms (that I can find) for point-finding after data restructuring.
So why does this appear to be possible with mere visual inspection?
Given a scatterplot of some given number of points n, if I want to find the closest point to any point in the plot mentally, I can immediately ignore most points in the graph, narrowing my choices down to some small, constant number of points nearby.
Yet, in programming, given a set of points n, in order to find the closest point to any one, it requires checking every other point, which is ${\cal O}(n)$ time.
I am guessing that the visual sight of a graph is likely the equivalent of some data structure I am incapable of understanding; because with programming, by converting the points to a more structured method such as a quadtree, one can find the closest points to $k$ points in $n$ in $k\cdot\log(n)$ time, or ammortized ${\cal O}(\log n)$ time.
But there is still no known ${\cal O}(1)$ ammortized algorithms (that I can find) for point-finding after data restructuring.
So why does this appear to be possible with mere visual inspection?
Solution
Your model of what you do mentally is incorrect. In fact, you operate in two steps:
If you've played games like pétanque (bowls) or curling, this should be familiar — you don't need to examine the objects that are very far from the target, but you may need to measure the closest contenders.
To illustrate this point, which green dot is closest to the red dot? (Only by a little over 1 pixel, but there is one that's closest.) To make things easier, the dots have even been color-coded by distance.
This picture contains $m=10$ points which are nearly on a circle, and $n \gg 10$ green points in total. Step 1 lets you eliminate all but about $m$ points, but step 2 requires checking each of the $m$ points. There is no a priori bound for $m$.
A physical observation lets you shrink the problem size from the whole set of $n$ points to a restricted candidate set of $m$ points. This step is not a computation step as commonly understood, because it is based on a continuous process. Continuous processes are not subject to the usual intuitions about computational complexity and in particular to asymptotic analysis.
Now, you may ask, why can't a continuous process completely solve the problem? How does it come to these $m$ points, why can't we refine the process to get $m=1$?
The answer is that I cheated a bit: I presented a set of points which is generated to consists of $m$ almost-closest points and $n-m$ points which are further. In general, determining which points lie within a precise boundary requires a precise observation which has to be performed point by point. A coarse process of elimination lets you exclude many obvious non-candidates, but merely deciding which candidates are left requires enumerating them.
You can model this system in a discrete, computational world. Assume that the points are represented in a data structure that sorts them into cells on a grid, i.e. the point $(x,y)$ is stored in a list for the cell $(\lfloor x \rfloor, \lfloor y \rfloor)$. If you're looking for the points that are closest to $(x_0, y_0)$ and the cell that contains this point contains at most one other point, then it is sufficient to check the containing cell and the 8 neighboring cells. The total number of points in these 9 cells is $m$. This model respects some key properties of the human model:
- Eliminate all points that are too far, in $O(1)$ time.
- Measure the $m$ points that are about as close, in $\Theta(m)$ time.
If you've played games like pétanque (bowls) or curling, this should be familiar — you don't need to examine the objects that are very far from the target, but you may need to measure the closest contenders.
To illustrate this point, which green dot is closest to the red dot? (Only by a little over 1 pixel, but there is one that's closest.) To make things easier, the dots have even been color-coded by distance.
This picture contains $m=10$ points which are nearly on a circle, and $n \gg 10$ green points in total. Step 1 lets you eliminate all but about $m$ points, but step 2 requires checking each of the $m$ points. There is no a priori bound for $m$.
A physical observation lets you shrink the problem size from the whole set of $n$ points to a restricted candidate set of $m$ points. This step is not a computation step as commonly understood, because it is based on a continuous process. Continuous processes are not subject to the usual intuitions about computational complexity and in particular to asymptotic analysis.
Now, you may ask, why can't a continuous process completely solve the problem? How does it come to these $m$ points, why can't we refine the process to get $m=1$?
The answer is that I cheated a bit: I presented a set of points which is generated to consists of $m$ almost-closest points and $n-m$ points which are further. In general, determining which points lie within a precise boundary requires a precise observation which has to be performed point by point. A coarse process of elimination lets you exclude many obvious non-candidates, but merely deciding which candidates are left requires enumerating them.
You can model this system in a discrete, computational world. Assume that the points are represented in a data structure that sorts them into cells on a grid, i.e. the point $(x,y)$ is stored in a list for the cell $(\lfloor x \rfloor, \lfloor y \rfloor)$. If you're looking for the points that are closest to $(x_0, y_0)$ and the cell that contains this point contains at most one other point, then it is sufficient to check the containing cell and the 8 neighboring cells. The total number of points in these 9 cells is $m$. This model respects some key properties of the human model:
- $m$ is potentially unbounded — a degenerate worse case of e.g. points lying almost on a circle is always possible.
- The practical efficiency depends on having selected a scale that matches the data (e.g. you'll save nothing if your dots are on a piece of paper and your cells are 1 km wide).
Context
StackExchange Computer Science Q#22693, answer score: 123
Revisions (0)
No revisions yet.