patternMinor
How does Yelp efficiently calculate distance in the database?
Viewed 0 times
efficientlytheyelpdistancedatabasecalculatedoeshow
Problem
For example, say I have a table:
All are indexed of course. Also there are 1 million records
Say I want to find businesses closest to 106,5, for example, how would I do so?
If I do
for example, or if I do
In theory the computer will have to compute distance for all biz while in practice only those with lattitude and longitude within a certain range that should be computed.
So how can I do what I want in PhP, or SQL, for example?
I am grateful with the answer so far. I am using mysql and they do not have anything more efficient than the obvious solution. MySQL spatial do not have compute distance function either.
Business(BusinessID, Lattitude, Longitude)All are indexed of course. Also there are 1 million records
Say I want to find businesses closest to 106,5, for example, how would I do so?
If I do
SELECT *
FROM Business
WHERE (Some formula to compute distance here) < 2000for example, or if I do
SELECT *
FROM Business
TOP 20In theory the computer will have to compute distance for all biz while in practice only those with lattitude and longitude within a certain range that should be computed.
So how can I do what I want in PhP, or SQL, for example?
I am grateful with the answer so far. I am using mysql and they do not have anything more efficient than the obvious solution. MySQL spatial do not have compute distance function either.
Solution
If I understand the question correctly (and I'm not sure I do), you are worried about computing
This can be mitigated to a degree by using the indexes on
Where 96, 116 etc are chosen to match the unit of the value '2000' and the point on the globe you are calculating distances from.
How precisely this uses indexes will depend on your RDBMS and the choices its planner makes.
In general terms, this is a primitive way of optimising a kind of nearest neighbour search. If your RDBMS supports GiST indexes, like postgres then you should consider using them instead.
"(Some formula to compute distance here)" for every row in the table each time you do a query? This can be mitigated to a degree by using the indexes on
latitude and longitude so we only have to compute the distance for a 'box' of points containing the circle we actually want:select * from business
where (latitude>96 and latitude-5 and longitude<15) and
(Some formula to compute distance here) < 2000Where 96, 116 etc are chosen to match the unit of the value '2000' and the point on the globe you are calculating distances from.
How precisely this uses indexes will depend on your RDBMS and the choices its planner makes.
In general terms, this is a primitive way of optimising a kind of nearest neighbour search. If your RDBMS supports GiST indexes, like postgres then you should consider using them instead.
Code Snippets
select * from business
where (latitude>96 and latitude<116) and
(longitude>-5 and longitude<15) and
(Some formula to compute distance here) < 2000Context
StackExchange Database Administrators Q#4210, answer score: 8
Revisions (0)
No revisions yet.