patternMinor
SQL Server: query performance (search 2 million rows)
Viewed 0 times
millionrowssearchsqlqueryperformanceserver
Problem
Got an interesting one for all of you SQL guru's out there. Now this search is only taking a couple of seconds, but it's quite intensive and there must be a better way. Maybe I'm expecting too much?
Simple holiday search app. 2 million holidays. Paging/Sorting around 600,000 rows.
This is the schema of the table
As you can see, quite simple. We have a property, a price, duration, departure/destination airports, etc. Now, the more field that are provided the faster the search. If I have a Departure Airport, Property and Date then the search is very fast. However, if I just have a Country and nothing else, there is a lot of data to work through.
Using this CSV export of my table, there are 2 million rows in total, and around 666k just with a country code of FR, which is my example.
This, is the search query. Which returns two tables. The first is a summary, so total number of holidays that match your criteria and how many unique properties. The second table contains the actual results from the search.
```
--Build a temp table, and store everything we need in it
CREATE TABLE #Pricing (PropertyId int, Duration int, HolidayId int, Rating int, Price int, StartDate datetime, PropertyRow int);
INSERT INTO #Pricing
SELECT
PropertyId, Duration, [Id], [Rating], [Price], DepartureDate,
ROW_NUMBER() OVER (PARTITION BY PropertyId ORDER BY Price ASC) as PropertyRow
FROM
dbo.Holiday
WHERE
DepartureDate > GETDATE() AND Country = 'FR'
--G
Simple holiday search app. 2 million holidays. Paging/Sorting around 600,000 rows.
This is the schema of the table
CREATE TABLE [dbo].[Holiday](
[Id] [int] NOT NULL,
[PropertyId] [int] NOT NULL,
[Price] [int] NOT NULL,
[Rating] [int] NOT NULL,
[Country] [char](2) NOT NULL,
[ResortId] [int] NOT NULL,
[DepartureAirport] [char](3) NOT NULL,
[DestinationAirport] [char](3) NOT NULL,
[DepartureDate] [datetime] NOT NULL,
[Basis] [char](2) NOT NULL,
[Duration] [int] NOT NULL,
CONSTRAINT [PK_Holiday] PRIMARY KEY CLUSTERED ([Id] ASC)
)As you can see, quite simple. We have a property, a price, duration, departure/destination airports, etc. Now, the more field that are provided the faster the search. If I have a Departure Airport, Property and Date then the search is very fast. However, if I just have a Country and nothing else, there is a lot of data to work through.
Using this CSV export of my table, there are 2 million rows in total, and around 666k just with a country code of FR, which is my example.
This, is the search query. Which returns two tables. The first is a summary, so total number of holidays that match your criteria and how many unique properties. The second table contains the actual results from the search.
```
--Build a temp table, and store everything we need in it
CREATE TABLE #Pricing (PropertyId int, Duration int, HolidayId int, Rating int, Price int, StartDate datetime, PropertyRow int);
INSERT INTO #Pricing
SELECT
PropertyId, Duration, [Id], [Rating], [Price], DepartureDate,
ROW_NUMBER() OVER (PARTITION BY PropertyId ORDER BY Price ASC) as PropertyRow
FROM
dbo.Holiday
WHERE
DepartureDate > GETDATE() AND Country = 'FR'
--G
Solution
Without indexes, any time the search query is run, all 2 million records have to be searched to find the 650K 'FR' instances. With indexes, the database can essentially go straight to them. Even if you left the search query as-is, I think proper indexes would give you speed improvements.
As far as the temp table, I really don't see why that couldn't be done as a sub in the final results query. Aside from that, I think temp tables would be more useful for persistent connections to the DB, or pools of users. If you're just creating the temp table and immediately destroying it... then it's basically just being used as a subquery.
Update: M_M makes a good point in his comment on this answer. However, I still feel an index would be better if the majority of activity is not just on sets where country is the only criteria. For me (just my opinion) it would come down to how often the 'FR' subset would be needed by itself with no other criteria. Otherwise, indexes could be used in the marjority of the searches.
As far as the temp table, I really don't see why that couldn't be done as a sub in the final results query. Aside from that, I think temp tables would be more useful for persistent connections to the DB, or pools of users. If you're just creating the temp table and immediately destroying it... then it's basically just being used as a subquery.
Update: M_M makes a good point in his comment on this answer. However, I still feel an index would be better if the majority of activity is not just on sets where country is the only criteria. For me (just my opinion) it would come down to how often the 'FR' subset would be needed by itself with no other criteria. Otherwise, indexes could be used in the marjority of the searches.
Context
StackExchange Database Administrators Q#14591, answer score: 2
Revisions (0)
No revisions yet.