patternsqlMinor
Finding closest points without duplicates and given distance threshold
Viewed 0 times
withoutpointsdistancefindingthresholdclosestandgivenduplicates
Problem
Given a bunch of latitudes and longitudes stored as Points, I would like to determine the points that are closest to each point given some maximum threshold in meters. This is my first stab at this:
Any ideas on how to potentially improve this, especially in performance? All indices are already set.
DECLARE @ThresholdInMeters FLOAT = 1000000.0;
IF OBJECT_ID('tempdb..#Points1') IS NOT NULL
DROP TABLE #Points1
CREATE TABLE #Points1
(
PointId INT IDENTITY(1, 1) ,
Point GEOGRAPHY NULL
)
DECLARE @LowerLatitude FLOAT;
DECLARE @UpperLatitude FLOAT;
DECLARE @RandomLatitude FLOAT;
DECLARE @LowerLongitude FLOAT;
DECLARE @UpperLongitude FLOAT;
DECLARE @RandomLongitude FLOAT;
DECLARE @Counter INT;
SET @LowerLatitude = -90
SET @UpperLatitude = 90
SET @LowerLongitude = -180
SET @UpperLongitude = 180
SET @Counter = 100;
WHILE ( @Counter > 0 )
BEGIN
SELECT @RandomLatitude = ( ( @UpperLatitude - @LowerLatitude - 1 )
* RAND() + @LowerLatitude );
SELECT @RandomLongitude = ( ( @UpperLongitude - @LowerLongitude - 1 )
* RAND() + @LowerLongitude );
INSERT INTO #Points1
SELECT GEOGRAPHY::Point(@RandomLatitude, @RandomLongitude,
4326)
SET @Counter = @Counter - 1;
END
SELECT
Points1.PointId AS StartPointId,
Points2.PointId AS EndPointId,
Points1.Point.STDistance(Points2.Point) AS Distance
FROM #Points1 AS Points1
INNER JOIN #Points1 AS Points2 ON Points1.PointId <= Points2.PointId
WHERE Points1.Point.STDistance(Points2.Point) < @ThresholdInMetersAny ideas on how to potentially improve this, especially in performance? All indices are already set.
Solution
Style
Remove unneccessary parentheses
You don't need parentheses around the condition for the
loop. You can just write
Also, the extra sets of parentheses around the random numbers are
superfluous. If you leave them off, the line will be a bit less
daunting.
Don't break the line in the middle of a concept
That is, in the
just before the
the
parentheses (if really unavoidable) or break before the
Merge variable declaration and initialisation
Where possible, declare a variable just before it is used, and
initialise it there as well:
Performance
You mention that the indices are already set. I assume that means that
it is not possible to add a Spatial Index, or a primary key.
Although I would strongly suggest adding a primary key constraint to
the temp table in this example:
Unfortunately, without a Spatial Index or changes to the result
requirements (you'd need a
STDistance column), there is no choice but to check each point against
all others, which will give a large join of which I don't know how to
improve the performance.
Small point: you may gain a smaller result (only in number of rows) if
you leave out the distance from each point to itself:
but that might not fit your requirements.
Remove unneccessary parentheses
You don't need parentheses around the condition for the
WHILEloop. You can just write
WHILE @counter > 0.Also, the extra sets of parentheses around the random numbers are
superfluous. If you leave them off, the line will be a bit less
daunting.
Don't break the line in the middle of a concept
That is, in the
SELECT @RandomLatitude line, the statement breaksjust before the
* operator. As it has a higher precedence thanthe
+, this can be a bit misleading. Either don't break, useparentheses (if really unavoidable) or break before the
+:SELECT @RandomLatitude = ( @UpperLatitude - @LowerLatitude - 1 ) * RAND()
+ @LowerLatitude;
SELECT @RandomLongitude = ( @UpperLongitude - @LowerLongitude - 1 ) * RAND()
+ @LowerLongitude;Merge variable declaration and initialisation
Where possible, declare a variable just before it is used, and
initialise it there as well:
DECLARE @LowerLatitude FLOAT = -90;
DECLARE @UpperLatitude FLOAT = 90;
DECLARE @LowerLongitude FLOAT = -180;
DECLARE @UpperLongitude FLOAT = 180;
DECLARE @Counter INT = 1000;
WHILE @Counter > 0
BEGIN
DECLARE @RandomLatitude FLOAT = ( @UpperLatitude - @LowerLatitude - 1 ) * RAND()
+ @LowerLatitude
DECLARE @RandomLongitude FLOAT = ( @UpperLongitude - @LowerLongitude - 1 ) * RAND()
+ @LowerLongitude
-- ...Performance
You mention that the indices are already set. I assume that means that
it is not possible to add a Spatial Index, or a primary key.
Although I would strongly suggest adding a primary key constraint to
the temp table in this example:
CREATE TABLE #Points1
( PointId INT IDENTITY(1, 1) PRIMARY KEY
, Point GEOGRAPHY NULL
);Unfortunately, without a Spatial Index or changes to the result
requirements (you'd need a
TOP n clause and an ORDER BY on theSTDistance column), there is no choice but to check each point against
all others, which will give a large join of which I don't know how to
improve the performance.
Small point: you may gain a smaller result (only in number of rows) if
you leave out the distance from each point to itself:
INNER JOIN Points1 AS Points2 ON Points1.PointId < Points2.PointIdbut that might not fit your requirements.
Code Snippets
SELECT @RandomLatitude = ( @UpperLatitude - @LowerLatitude - 1 ) * RAND()
+ @LowerLatitude;
SELECT @RandomLongitude = ( @UpperLongitude - @LowerLongitude - 1 ) * RAND()
+ @LowerLongitude;DECLARE @LowerLatitude FLOAT = -90;
DECLARE @UpperLatitude FLOAT = 90;
DECLARE @LowerLongitude FLOAT = -180;
DECLARE @UpperLongitude FLOAT = 180;
DECLARE @Counter INT = 1000;
WHILE @Counter > 0
BEGIN
DECLARE @RandomLatitude FLOAT = ( @UpperLatitude - @LowerLatitude - 1 ) * RAND()
+ @LowerLatitude
DECLARE @RandomLongitude FLOAT = ( @UpperLongitude - @LowerLongitude - 1 ) * RAND()
+ @LowerLongitude
-- ...CREATE TABLE #Points1
( PointId INT IDENTITY(1, 1) PRIMARY KEY
, Point GEOGRAPHY NULL
);INNER JOIN Points1 AS Points2 ON Points1.PointId < Points2.PointIdContext
StackExchange Code Review Q#102166, answer score: 2
Revisions (0)
No revisions yet.