snippetsqlModerate
How to Best Implement nearest neighbour search in mysql?
Viewed 0 times
howimplementsearchnearestmysqlneighbourbest
Problem
So, in short,
Detail:
I have 100k biz record each with lattitude and longitude. I see that MySQL actually support a data type called point. Should I use that instead?
Does MySQL support KDTree storage system http://en.wikipedia.org/wiki/File:KDTree-animation.gif
Is it best to use point data type rather than regular float data type to store latitutude and longitude?
Eventually I want to find things like the first 100 restaurants closest to points 105,6 for example and my databases contains a lot of biz and points. Obviously computing the distance one by one for every records and for every points would be O(n) and hence sucks.
Notice that I am aware of a simpler solution described in How do Application Like Yelp Retrieve distance information from data base efficiently and will implement that my self too for a start. That's a good answer.
However, I think there is one creme of the crop answer that should outperform that right? In fact, storing location based on latitude and longitude and finding stuffs nearest to it is a very common problem I expect mysql to have a special design pattern for that. Does it have that?
Where can I learn more about it? Thanks.
- What should be the data type of latitude and longitude?
- What SQL command I should call to get the first 100 nearest restaurants for example?
Detail:
I have 100k biz record each with lattitude and longitude. I see that MySQL actually support a data type called point. Should I use that instead?
Does MySQL support KDTree storage system http://en.wikipedia.org/wiki/File:KDTree-animation.gif
Is it best to use point data type rather than regular float data type to store latitutude and longitude?
Eventually I want to find things like the first 100 restaurants closest to points 105,6 for example and my databases contains a lot of biz and points. Obviously computing the distance one by one for every records and for every points would be O(n) and hence sucks.
Notice that I am aware of a simpler solution described in How do Application Like Yelp Retrieve distance information from data base efficiently and will implement that my self too for a start. That's a good answer.
However, I think there is one creme of the crop answer that should outperform that right? In fact, storing location based on latitude and longitude and finding stuffs nearest to it is a very common problem I expect mysql to have a special design pattern for that. Does it have that?
Where can I learn more about it? Thanks.
Solution
As far as design patterns, the Yelp question is pretty standard stuff.
For a more complex answer, you will probably need the geospatial distance. Here is a fascinating powerpoint about that topic (and here is a pdf version of that as well). However, the math involved is quite ugly.
From their slide:
There's a longer, more in-depth answer about geospatial distance on Stack Overflow.
But you still want to limit the results by latitude and longitude.
Ultimately, I would avoid the POINT datatype and go with latitude/longitude. There's currently no way to determine the distance between two POINTs, so you're going to have to store latitude/longitude for that calculation anyways.
One last link: you may also want to check out this SO thread regarding speeding up the queries using spatial indexes.
For a more complex answer, you will probably need the geospatial distance. Here is a fascinating powerpoint about that topic (and here is a pdf version of that as well). However, the math involved is quite ugly.
From their slide:
set @orig_lat=122.4058; set @orig_lon=37.7907;
set @dist=10;
SELECT *, 3956 * 2 * ASIN(SQRT(
POWER(SIN((@orig_lat - abs(dest.lat)) * pi()/180 / 2), 2) + COS(@orig_lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180) * POWER(SIN((@orig_lon – dest.lon) * pi()/180 / 2), 2) )) as distance
FROM hotels dest
having distance < @dist
ORDER BY distance limit 10There's a longer, more in-depth answer about geospatial distance on Stack Overflow.
But you still want to limit the results by latitude and longitude.
Ultimately, I would avoid the POINT datatype and go with latitude/longitude. There's currently no way to determine the distance between two POINTs, so you're going to have to store latitude/longitude for that calculation anyways.
One last link: you may also want to check out this SO thread regarding speeding up the queries using spatial indexes.
Code Snippets
set @orig_lat=122.4058; set @orig_lon=37.7907;
set @dist=10;
SELECT *, 3956 * 2 * ASIN(SQRT(
POWER(SIN((@orig_lat - abs(dest.lat)) * pi()/180 / 2), 2) + COS(@orig_lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180) * POWER(SIN((@orig_lon – dest.lon) * pi()/180 / 2), 2) )) as distance
FROM hotels dest
having distance < @dist
ORDER BY distance limit 10Context
StackExchange Database Administrators Q#4214, answer score: 13
Revisions (0)
No revisions yet.