patternsqlCritical
MERGE a subset of the target table
Viewed 0 times
thetargetmergesubsettable
Problem
I am trying to use a
It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
but this is exactly what it appears I have to do to make my
The data I have is a standard many-to-many join table of items to categories (e.g. which items are included in which categories) like so:
What I need to do is to effectively replace all rows in a specific category with a new list of items. My initial attempt to do this looks like this:
This appears to be working in my tests, but I am doing exactly what MSDN explicitly warns me not to do. This makes me concerned that I will run into unexpected problems later on, but I cannot see any other way to make my
Is there a "more correct" way to achieve this same result? And what are the "unexpected or incorrect results" that MSDN is warning me about?
MERGE statement to insert or delete rows from a table, but I only want to act on a subset of those rows. The documentation for MERGE has a pretty strongly worded warning:It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
but this is exactly what it appears I have to do to make my
MERGE work.The data I have is a standard many-to-many join table of items to categories (e.g. which items are included in which categories) like so:
CategoryId ItemId
========== ======
1 1
1 2
1 3
2 1
2 3
3 5
3 6
4 5What I need to do is to effectively replace all rows in a specific category with a new list of items. My initial attempt to do this looks like this:
MERGE INTO CategoryItem AS TARGET
USING (
SELECT ItemId FROM SomeExternalDataSource WHERE CategoryId = 2
) AS SOURCE
ON SOURCE.ItemId = TARGET.ItemId AND TARGET.CategoryId = 2
WHEN NOT MATCHED BY TARGET THEN
INSERT ( CategoryId, ItemId )
VALUES ( 2, ItemId )
WHEN NOT MATCHED BY SOURCE AND TARGET.CategoryId = 2 THEN
DELETE ;This appears to be working in my tests, but I am doing exactly what MSDN explicitly warns me not to do. This makes me concerned that I will run into unexpected problems later on, but I cannot see any other way to make my
MERGE only affect rows with the specific field value (CategoryId = 2) and ignore rows from other categories.Is there a "more correct" way to achieve this same result? And what are the "unexpected or incorrect results" that MSDN is warning me about?
Solution
The
Target
Source
The desired outcome is to replace data in the target with data from the source, but only for
This gives the following results:
The execution plan is:
Notice both tables are scanned fully. We might think this inefficient, because only rows where
The logic in the
The root cause is the same reason predicates behave differently in an outer join
The guidance in Books Online (expanded in the Optimizing Performance entry) offers guidance that will ensure the correct semantic is expressed using
The documentation offers three potential ways to impleme
MERGE statement has a complex syntax and an even more complex implementation, but essentially the idea is to join two tables, filter down to rows that need to be changed (inserted, updated, or deleted), and then to perform the requested changes. Given the following sample data:DECLARE @CategoryItem AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL,
PRIMARY KEY (CategoryId, ItemId),
UNIQUE (ItemId, CategoryId)
);
DECLARE @DataSource AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL
PRIMARY KEY (CategoryId, ItemId)
);
INSERT @CategoryItem
(CategoryId, ItemId)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 3),
(3, 5),
(3, 6),
(4, 5);
INSERT @DataSource
(CategoryId, ItemId)
VALUES
(2, 2);Target
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 2 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 3 ║
║ 3 ║ 5 ║
║ 4 ║ 5 ║
║ 3 ║ 6 ║
╚════════════╩════════╝Source
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝The desired outcome is to replace data in the target with data from the source, but only for
CategoryId = 2. Following the description of MERGE given above, we should write a query that joins the source and target on the keys only, and filter rows only in the WHEN clauses:MERGE INTO @CategoryItem AS TARGET
USING @DataSource AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND SOURCE.CategoryId = TARGET.CategoryId
WHEN NOT MATCHED BY SOURCE
AND TARGET.CategoryId = 2
THEN DELETE
WHEN NOT MATCHED BY TARGET
AND SOURCE.CategoryId = 2
THEN INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;This gives the following results:
╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 2 ║
║ 3 ║ 5 ║
║ 3 ║ 6 ║
║ 4 ║ 5 ║
╚════════════╩════════╝The execution plan is:
Notice both tables are scanned fully. We might think this inefficient, because only rows where
CategoryId = 2 will be affected in the target table. This is where the warnings in Books Online come in. One misguided attempt to optimize to touch only necessary rows in the target is:MERGE INTO @CategoryItem AS TARGET
USING
(
SELECT CategoryId, ItemId
FROM @DataSource AS ds
WHERE CategoryId = 2
) AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND TARGET.CategoryId = 2
WHEN NOT MATCHED BY TARGET THEN
INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;The logic in the
ON clause is applied as part of the join. In this case, the join is a full outer join (see this Books Online entry for why). Applying the check for category 2 on the target rows as part of an outer join ultimately results in rows with a different value being deleted (because they do not match the source):╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 1 ║ 1 ║
║ DELETE ║ 1 ║ 2 ║
║ DELETE ║ 1 ║ 3 ║
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
║ DELETE ║ 3 ║ 5 ║
║ DELETE ║ 3 ║ 6 ║
║ DELETE ║ 4 ║ 5 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝The root cause is the same reason predicates behave differently in an outer join
ON clause than they do if specified in the WHERE clause. The MERGE syntax (and the join implementation depending on the clauses specified) just make it harder to see that this is so.The guidance in Books Online (expanded in the Optimizing Performance entry) offers guidance that will ensure the correct semantic is expressed using
MERGE syntax, without the user necessarily having to understand all the implementation details, or account for the ways in which the optimizer might legitimately rearrange things for execution efficiency reasons.The documentation offers three potential ways to impleme
Code Snippets
DECLARE @CategoryItem AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL,
PRIMARY KEY (CategoryId, ItemId),
UNIQUE (ItemId, CategoryId)
);
DECLARE @DataSource AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL
PRIMARY KEY (CategoryId, ItemId)
);
INSERT @CategoryItem
(CategoryId, ItemId)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 3),
(3, 5),
(3, 6),
(4, 5);
INSERT @DataSource
(CategoryId, ItemId)
VALUES
(2, 2);╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 2 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 3 ║
║ 3 ║ 5 ║
║ 4 ║ 5 ║
║ 3 ║ 6 ║
╚════════════╩════════╝╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝MERGE INTO @CategoryItem AS TARGET
USING @DataSource AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND SOURCE.CategoryId = TARGET.CategoryId
WHEN NOT MATCHED BY SOURCE
AND TARGET.CategoryId = 2
THEN DELETE
WHEN NOT MATCHED BY TARGET
AND SOURCE.CategoryId = 2
THEN INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 2 ║
║ 3 ║ 5 ║
║ 3 ║ 6 ║
║ 4 ║ 5 ║
╚════════════╩════════╝Context
StackExchange Database Administrators Q#30633, answer score: 120
Revisions (0)
No revisions yet.