patternsqlModerate
SQL Server Linked Server performance: Why are remote queries so expensive?
Viewed 0 times
whysqlareperformanceserverquerieslinkedexpensiveremote
Problem
I have two database servers, connected via Linked Servers. Both are SQL Server 2008R2 databases, and the linked server connection is made via a regular "SQL Server" link, using the current login's security context. The linked servers are both in the same datacentre, so the connection shouldn't be an issue.
I use the following query to check which values of the column
On both tables are non-clustered indexes on the column
When executing this query, it takes a long time indeed.
It makes me wonder: Why is this? Is the estimation "just" way off, or are remote queries on linked servers really that expensive?
I use the following query to check which values of the column
identifier are available remotely, but not locally.SELECT
identifier
FROM LinkedServer.RemoteDb.schema.[TableName]
EXCEPT
SELECT DISTINCT
identifier
FROM LocalDb.schema.[TableName]On both tables are non-clustered indexes on the column
identifier. Locally are around 2.6M rows, remotely only 54. Yet, when looking at the query plan, 70% of the execution time is devoted to "executing remote query". Also, when studying the complete query plan, the number of estimated local rows is 1 instead of 2695380 (which is the number of estimated rows when selecting only the query coming after EXCEPT). When executing this query, it takes a long time indeed.
It makes me wonder: Why is this? Is the estimation "just" way off, or are remote queries on linked servers really that expensive?
Solution
The plan you have at the moment looks like the most optimal plan to me.
I don't agree with the assertion in the other answers that it is sending the 2.6M rows to the remote server.
The plan looks to me as though for each of the 54 rows returned from the remote query it is performing an index seek into your local table to determine whether it is matched or not. This is pretty much the optimal plan.
Replacing with a hash join or merge join would be counterproductive given the size of table and adding an intermediate
I don't agree with the assertion in the other answers that it is sending the 2.6M rows to the remote server.
The plan looks to me as though for each of the 54 rows returned from the remote query it is performing an index seek into your local table to determine whether it is matched or not. This is pretty much the optimal plan.
Replacing with a hash join or merge join would be counterproductive given the size of table and adding an intermediate
#temp table just adds an additional step that doesn't seem to give you any advantage.Context
StackExchange Database Administrators Q#10873, answer score: 10
Revisions (0)
No revisions yet.