patternsqlMinor
Some SQL Server geospatial queries take much, much longer than others
Viewed 0 times
muchsqltakelongerthangeospatialotherssomeserverqueries
Problem
We have a simple SQL Server table with geospatial data that looks like this:
We have something like 100k+ rows in it right now, but that's expected to grow to millions.
We run queries on it that look like this:
Here's the bit that's weird though. The data in that table is spotty: we've got it for the southern parts of a state, for instance, but not for the whole state. If the point we search for is within a few hundred meters of points we've got data for (for instance, from the southern part of the state), the query returns subsecond. But if it's, say, 100 kilometers from the nearest data point (for instance, if the target point is from the northern part of the state), then the query will take up to 3 minutes or so to return. In both instances, query plans indicate that they're starting with a scan of the geospatial index, so it's not the problem that sometimes happens, that SQL Server can't figure out it's supposed to use the index in question.
My assumption is that it has something to do with how the geospatial index is laid out.
```
CREATE SPATIAL INDEX IX_Factors_Spatial
ON [dbo].[Factors] (GeoLocation)
USING GEOGRAPHY_AUTO_GRID
WITH (
CELLS_PER_OBJECT = 16,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCK
CREATE TABLE [dbo].[Factors](
[Id] [int] IDENTITY(1,1) NOT NULL,
[StateCode] [nvarchar](2) NOT NULL,
[GeoLocation] [geography] NULL,
[Factor] [decimal](18, 6) NOT NULL,
CONSTRAINT [PK_dbo.Factors] PRIMARY KEY CLUSTERED
(
[Id] ASC
)We have something like 100k+ rows in it right now, but that's expected to grow to millions.
We run queries on it that look like this:
declare @state nvarchar(2) = 'AL'
declare @point geography = geography::STGeomFromText('POINT(-86.19146040 32.38225770)', 4326)
select top 3
Lat,
Lon,
Factor,
GeoLocation.STDistance(@point) as Distance
from dbo.Factors
where StateCode = @state and GeoLocation.STDistance(@point) is not null
order by DistanceHere's the bit that's weird though. The data in that table is spotty: we've got it for the southern parts of a state, for instance, but not for the whole state. If the point we search for is within a few hundred meters of points we've got data for (for instance, from the southern part of the state), the query returns subsecond. But if it's, say, 100 kilometers from the nearest data point (for instance, if the target point is from the northern part of the state), then the query will take up to 3 minutes or so to return. In both instances, query plans indicate that they're starting with a scan of the geospatial index, so it's not the problem that sometimes happens, that SQL Server can't figure out it's supposed to use the index in question.
My assumption is that it has something to do with how the geospatial index is laid out.
```
CREATE SPATIAL INDEX IX_Factors_Spatial
ON [dbo].[Factors] (GeoLocation)
USING GEOGRAPHY_AUTO_GRID
WITH (
CELLS_PER_OBJECT = 16,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCK
Solution
Short answer: Compare actual execution plans for the fast and slow variants and you'll see yourself.
When the given
When the given
If you disable the spatial index you'll see that performance of the query becomes the same for any given
See Spatial Indexes Overview if you haven't already for the basic details about the internal structures of such index.
Sample test data
Generate ~100K rows within ~20km by ~20km area.
Create default spatial index
Test queries
Execution plans and IO
Index enabled
IO. Top result is fast (7ms, 171 reads). Bottom result is slow (5,693ms, 234,662 reads).
Fast.
Slow.
Index disabled
IO. Both queries have the same number of reads (601) and same duration (~1700ms).
Plan is the same for both queries:
It is faster to scan 100K rows, than seek 100K times.
I don't know how to solve the problem, if there is a way to get the best of the both worlds and somehow automatically decide whether to use the index or not.
You can try to calculate the bounding box (min/max lat/lon) and change the logic based on whether the given point is within the bounding box.
The most interesting thing happens in that built-in Geodetic Tesselation table-valued function and I don't see how to fine-tune it.
With spatial indexes a lot depends on your data distribution.
In some cases you may be better off with two separate simple standard indexes on latitude and longitude if you know that your data is dense and you can limit the search to a narrow stripe or small area (given point +- few km).
When the given
@point is close to the points in the table, the tessellations used in the spatial index actually help to dismiss most of the rows and only few seeks of the index are necessary.When the given
@point is far from any point in the table, the engine effectively has to read all rows. It seeks an index 100K times, which is slow.If you disable the spatial index you'll see that performance of the query becomes the same for any given
@point. It will be slower than your fast variant when index is useful, but it will be faster than your slow variant when the index is harmful.See Spatial Indexes Overview if you haven't already for the basic details about the internal structures of such index.
Sample test data
CREATE TABLE [dbo].[Factors](
[Id] [int] IDENTITY(1,1) NOT NULL,
[GeoLocation] [geography] NULL,
[Factor] [decimal](18, 6) NOT NULL,
CONSTRAINT [PK_Factors] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]Generate ~100K rows within ~20km by ~20km area.
DECLARE @MinLat float = -38.180184;
DECLARE @MaxLat float = -38.000000;
DECLARE @MinLon float = 145.000000;
DECLARE @MaxLon float = 145.227707;
DECLARE @PointCount int = 317;
WITH
x AS
(
SELECT TOP (@PointCount)
ROW_NUMBER() OVER (ORDER BY [object_id]) AS rn
FROM sys.all_objects
)
INSERT INTO [dbo].[Factors]
([GeoLocation]
,[Factor])
SELECT
geography::Point(
@MinLat + (TLat.rn-1) * (@MaxLat - @MinLat) / (@PointCount-1)
,@MinLon + (TLon.rn-1) * (@MaxLon - @MinLon) / (@PointCount-1)
,4326) AS GeoLocation
,0 AS Factor
FROM
x AS TLat CROSS JOIN x AS TLon
ORDER BY TLat.rn, TLon.rn;Create default spatial index
CREATE SPATIAL INDEX [IX_GeoLocation] ON [dbo].[Factors]
(
[GeoLocation]
)USING GEOGRAPHY_AUTO_GRID
WITH (
CELLS_PER_OBJECT = 16,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]Test queries
@point1 is close to other points in the table.@point2 is far away from other points in the table.declare @point1 geography = geography::Point(-38.000000, 145.000000, 4326);
declare @point2 geography = geography::Point(+38.000000, 145.000000, 4326);
select top 3
Factor,
GeoLocation.STDistance(@point1) as Distance
from dbo.Factors
where GeoLocation.STDistance(@point1) is not null
order by Distance
option(recompile);
select top 3
Factor,
GeoLocation.STDistance(@point2) as Distance
from dbo.Factors
where GeoLocation.STDistance(@point2) is not null
order by Distance
option(recompile);Execution plans and IO
Index enabled
IO. Top result is fast (7ms, 171 reads). Bottom result is slow (5,693ms, 234,662 reads).
Fast.
Slow.
Index disabled
IO. Both queries have the same number of reads (601) and same duration (~1700ms).
Plan is the same for both queries:
It is faster to scan 100K rows, than seek 100K times.
I don't know how to solve the problem, if there is a way to get the best of the both worlds and somehow automatically decide whether to use the index or not.
You can try to calculate the bounding box (min/max lat/lon) and change the logic based on whether the given point is within the bounding box.
The most interesting thing happens in that built-in Geodetic Tesselation table-valued function and I don't see how to fine-tune it.
With spatial indexes a lot depends on your data distribution.
In some cases you may be better off with two separate simple standard indexes on latitude and longitude if you know that your data is dense and you can limit the search to a narrow stripe or small area (given point +- few km).
Code Snippets
CREATE TABLE [dbo].[Factors](
[Id] [int] IDENTITY(1,1) NOT NULL,
[GeoLocation] [geography] NULL,
[Factor] [decimal](18, 6) NOT NULL,
CONSTRAINT [PK_Factors] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]DECLARE @MinLat float = -38.180184;
DECLARE @MaxLat float = -38.000000;
DECLARE @MinLon float = 145.000000;
DECLARE @MaxLon float = 145.227707;
DECLARE @PointCount int = 317;
WITH
x AS
(
SELECT TOP (@PointCount)
ROW_NUMBER() OVER (ORDER BY [object_id]) AS rn
FROM sys.all_objects
)
INSERT INTO [dbo].[Factors]
([GeoLocation]
,[Factor])
SELECT
geography::Point(
@MinLat + (TLat.rn-1) * (@MaxLat - @MinLat) / (@PointCount-1)
,@MinLon + (TLon.rn-1) * (@MaxLon - @MinLon) / (@PointCount-1)
,4326) AS GeoLocation
,0 AS Factor
FROM
x AS TLat CROSS JOIN x AS TLon
ORDER BY TLat.rn, TLon.rn;CREATE SPATIAL INDEX [IX_GeoLocation] ON [dbo].[Factors]
(
[GeoLocation]
)USING GEOGRAPHY_AUTO_GRID
WITH (
CELLS_PER_OBJECT = 16,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]declare @point1 geography = geography::Point(-38.000000, 145.000000, 4326);
declare @point2 geography = geography::Point(+38.000000, 145.000000, 4326);
select top 3
Factor,
GeoLocation.STDistance(@point1) as Distance
from dbo.Factors
where GeoLocation.STDistance(@point1) is not null
order by Distance
option(recompile);
select top 3
Factor,
GeoLocation.STDistance(@point2) as Distance
from dbo.Factors
where GeoLocation.STDistance(@point2) is not null
order by Distance
option(recompile);Context
StackExchange Database Administrators Q#141006, answer score: 6
Revisions (0)
No revisions yet.