patternsqlMinor
Azure SQL cross-database queries returns too many rows with joins
Viewed 0 times
rowssqlcrosswithtoodatabaseazurereturnsmanyqueries
Problem
I have a problem with Azure SQL cross-database queries, it looks like if we join data across databases the join is not distributed as input to the external database.
Currently I tried to implement a simple example and therefore I used this starting point:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-elastic-query-getting-started-vertical.
As preperation I created 199999 CustomerInformation entries and 200 OrderInformation entries.
Following query works as expected the CustomerInformation Table returns just one row:
But any query with a join that does not specifiy the CustomerID returns all the rows from the Customers Database.
Therefore the queries are really slow because the servers transfer to many rows. Do I miss some setting or is it not possible to separate this data with the suggested queries.
Thanks for your help.
Currently I tried to implement a simple example and therefore I used this starting point:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-elastic-query-getting-started-vertical.
As preperation I created 199999 CustomerInformation entries and 200 OrderInformation entries.
Following query works as expected the CustomerInformation Table returns just one row:
SELECT o.OrderId, c.CustomerID, c.CustomerName
FROM OrderInformation o
JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
WHERE c.CustomerID = 2But any query with a join that does not specifiy the CustomerID returns all the rows from the Customers Database.
SELECT o.OrderId, c.CustomerID, c.CustomerName
FROM OrderInformation o
JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
WHERE o.OrderId = 1870Therefore the queries are really slow because the servers transfer to many rows. Do I miss some setting or is it not possible to separate this data with the suggested queries.
Thanks for your help.
Solution
Recreating the issue
Test data below
All records are returned by the remote query operator, to filter down to one when the inner join is applied.
Some more background on pushing down the predicate to a remote database:
Previously, elastic database query could not push parameterized
operations to remote databases. As a result, sometimes large row sets
had to be unnecessarily brought local to evaluate these operations.
With the recent improvements, parameterized operations can now be
pushed to remote databases and be evaluated remotely. For a query over
an external table and a local table like the following, this can now
avoid transferring millions of rows by evaluating the selective filter
in the WHERE clause on the remote database:
Source
Above query shows an example of using parameters to define a range from where to read the data. The difference with your problem is that the filtering is happening on the local table.
If we know that this predicate can be pushed down to the remote table
You could look for a range of customerid's and filter on these
The complete result
Pushing down the predicates:
Execution plan of this query
It depends on your data & parameters passed how much this could filter out on the remote table. YMMV
Test data
Customerinformation (remote db)
On the database that will use the external table
Test data below
SELECT o.OrderId, c.CustomerID, c.CustomerName
FROM OrderInformation o
JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
WHERE o.OrderId = 155;All records are returned by the remote query operator, to filter down to one when the inner join is applied.
Some more background on pushing down the predicate to a remote database:
Previously, elastic database query could not push parameterized
operations to remote databases. As a result, sometimes large row sets
had to be unnecessarily brought local to evaluate these operations.
With the recent improvements, parameterized operations can now be
pushed to remote databases and be evaluated remotely. For a query over
an external table and a local table like the following, this can now
avoid transferring millions of rows by evaluating the selective filter
in the WHERE clause on the remote database:
DECLARE @low int
DECLARE @high int
SET @low = 100
SET @high = 200
SELECT c.CustomerId, c.Name, count(OrderId)
FROM remote_customers c
JOIN local_orders o
ON c.CustomerId = o.CustomerId
WHERE c.CustomerId > @low and c.CustomerId < @high
GROUP BY c.CustomerId, c.Name;Source
Above query shows an example of using parameters to define a range from where to read the data. The difference with your problem is that the filtering is happening on the local table.
If we know that this predicate can be pushed down to the remote table
SELECT o.OrderId,c.CustomerID, c.CustomerName
FROM CustomerInformation c
JOIN OrderInformation o on o.CustomerId = c.CustomerID
WHERE o.customerID > 1 and o.Customerid < 10;You could look for a range of customerid's and filter on these
MAX() and MIN() functions on o.Customerid are used to get the upper and lower boundaries for WHERE o.OrderId = 155DECLARE @MAX INT, @MIN INT;
SELECT @MAX= max(o.Customerid) , @MIN = MIN(o.Customerid)
FROM OrderInformation o
WHERE o.OrderId = 155;
SELECT o.OrderId,c.CustomerID, c.CustomerName
FROM CustomerInformation c
WHERE c.customerID >= @MIN and c.Customerid <= @MAX;The complete result
DECLARE @MAX INT, @MIN INT;
SELECT @MAX= max(o.Customerid) , @MIN = MIN(o.Customerid)
FROM
OrderInformation o
WHERE o.OrderId = 155;
;WITH CTE as(
SELECT c.CustomerID, c.CustomerName
FROM CustomerInformation c
WHERE c.customerID >= @MIN
AND c.Customerid <= @MAX
)
SELECT o.OrderId, c.CustomerID, c.CustomerName
FROM CTE c
INNER JOIN OrderInformation o
on o.customerID = c.CustomerID
where o.Orderid = 155;Pushing down the predicates:
Execution plan of this query
RemoteQuery="SELECT [c].[CustomerID] [Col1005],[c].[CustomerName] [Col1007] FROM [CustomerInformation] [c] WHERE [c].[CustomerID]>=? AND [c].[CustomerID]<=?"It depends on your data & parameters passed how much this could filter out on the remote table. YMMV
Test data
Customerinformation (remote db)
CREATE TABLE dbo.CustomerInformation (CustomerID int identity(1,1) , CustomerName nvarchar(255));
DECLARE @i int = 1;
WHILE @i < 20000
BEGIN
INSERT INTO CustomerInformation(CustomerName)
VALUES ('Frodo');
SET @i += 1;
ENDOn the database that will use the external table
CREATE TABLE OrderInformation(OrderId Int identity(1,1), CustomerId int,Ordervalue int);
SET NOCOUNT ON;
DECLARE @i int = 1;
WHILE @i < 200
BEGIN
INSERT INTO OrderInformation(CustomerId,Ordervalue)
Values (@I,111);
set @i += 1;
END
CREATE EXTERNAL TABLE [dbo].CustomerInformation(
CustomerID [int],
CustomerName nvarchar(255))
WITH
(
DATA_SOURCE = Stackoverflow
);Code Snippets
SELECT o.OrderId, c.CustomerID, c.CustomerName
FROM OrderInformation o
JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
WHERE o.OrderId = 155;DECLARE @low int
DECLARE @high int
SET @low = 100
SET @high = 200
SELECT c.CustomerId, c.Name, count(OrderId)
FROM remote_customers c
JOIN local_orders o
ON c.CustomerId = o.CustomerId
WHERE c.CustomerId > @low and c.CustomerId < @high
GROUP BY c.CustomerId, c.Name;SELECT o.OrderId,c.CustomerID, c.CustomerName
FROM CustomerInformation c
JOIN OrderInformation o on o.CustomerId = c.CustomerID
WHERE o.customerID > 1 and o.Customerid < 10;DECLARE @MAX INT, @MIN INT;
SELECT @MAX= max(o.Customerid) , @MIN = MIN(o.Customerid)
FROM OrderInformation o
WHERE o.OrderId = 155;
SELECT o.OrderId,c.CustomerID, c.CustomerName
FROM CustomerInformation c
WHERE c.customerID >= @MIN and c.Customerid <= @MAX;DECLARE @MAX INT, @MIN INT;
SELECT @MAX= max(o.Customerid) , @MIN = MIN(o.Customerid)
FROM
OrderInformation o
WHERE o.OrderId = 155;
;WITH CTE as(
SELECT c.CustomerID, c.CustomerName
FROM CustomerInformation c
WHERE c.customerID >= @MIN
AND c.Customerid <= @MAX
)
SELECT o.OrderId, c.CustomerID, c.CustomerName
FROM CTE c
INNER JOIN OrderInformation o
on o.customerID = c.CustomerID
where o.Orderid = 155;Context
StackExchange Database Administrators Q#226895, answer score: 2
Revisions (0)
No revisions yet.