patternsqlModerate
Can I optimize this MERGE statement?
Viewed 0 times
thiscanstatementmergeoptimize
Problem
I am attempting a single column merge between two tables. The first table (
In my development environment, the query takes just under 8 minutes. But in the production environment, it should take significantly less (much more powerful machine). However, I anticipate the query taking at least 2 minutes to run in production. I know that this query was causing timeouts for other developers in the development environment, which means it could easily cause timeouts for customers. Is there a safer and/or faster way to perform this query?
VisitorSession) has 40,000,000 rows. The second (ShoppingCart) has 9,000,000 rows.In my development environment, the query takes just under 8 minutes. But in the production environment, it should take significantly less (much more powerful machine). However, I anticipate the query taking at least 2 minutes to run in production. I know that this query was causing timeouts for other developers in the development environment, which means it could easily cause timeouts for customers. Is there a safer and/or faster way to perform this query?
declare @dt datetime = cast(dateadd(month, -6, getdate()) as date);
merge ShoppingCart as TargetTable -- 07:55 to complete in Dev
using
(
select * from -- 04:55 to run select, resulting in 12,727,927 rows in Dev
(
select
visitorid -- int, not null, foreign key
,useripaddress -- varchar(55), null
,row_number() over
(partition by visitorid order by createdate desc) as [row]
from VisitorSession (nolock)
where UserIPAddress is not null
and CreateDate > @dt -- createdate is a datetime, not null
) as subTbl
where subTbl.[row] = 1
) as SourceTable
on (TargetTable.VisitorID = SourceTable.VisitorID) -- visitorid is not a primary key
when matched
then update set
TargetTable.UserIpAddress = SourceTable.UserIpAddress;Solution
Personally, I don't like
I'm not trying to be a fear-mongerer here. But I also find the syntax unintuitive and daunting. So I would only use it in cases where it's actually needed and I can prove that I'm not affected by any of the above issues. I don't know what I would ever possibly gain from using it for an operation that could only ever end in an
So here is how I would do it instead, using syntax that is much more familiar to me:
You can also break this operation into chunks to reduce the impact on the transaction log which may, in turn, reduce the overall duration. I blogged about this here.
Here is how I would handle that approach:
Of course, as the blog post illustrates, you can gain just about as much time by making sure your log is big enough to handle the entire transaction without having to grow - most of the delay is likely coming from many, many autogrow operations accommodating your large transaction. Sadly until you've done the operation once it can be very difficult to try and guess how much transaction log you'll need...
MERGE because there are many unresolved bugs:- This blog post (scroll down to "Other
MERGEissues")
- This answer by AlexKuznetsov
- These posts on Paul White's blog
- I published a cautionary tip about
MERGEhere, including references to several bugs that haven't been addressed.
- And this list may not even be exhaustive - given the number of bugs discovered by this small set of users, I have to wonder how many haven't been discovered yet?
MERGE also gives a false sense of security in terms of optimistic concurrency and race conditions. See Dan Guzman's blog post for more details.I'm not trying to be a fear-mongerer here. But I also find the syntax unintuitive and daunting. So I would only use it in cases where it's actually needed and I can prove that I'm not affected by any of the above issues. I don't know what I would ever possibly gain from using it for an operation that could only ever end in an
UPDATE anyway. So here is how I would do it instead, using syntax that is much more familiar to me:
;WITH s AS
(
SELECT VisitorID, UserIpAddress FROM
(
SELECT
VisitorID,
UserIpAddress,
rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
FROM dbo.VisitorSession
WHERE UserIpAddress IS NOT NULL
AND CreateDate > @dt
) AS x
WHERE rn = 1
)
UPDATE c
SET c.UserIpAddress = s.UserIpAddress
FROM dbo.ShoppingCart AS c
INNER JOIN s
ON c.VisitorID = s.VisitorID;You can also break this operation into chunks to reduce the impact on the transaction log which may, in turn, reduce the overall duration. I blogged about this here.
Here is how I would handle that approach:
DECLARE
@dt DATE = DATEADD(MONTH, -6, SYSDATETIME()),
@rc INT = 1;
WHILE @rc > 0
BEGIN
BEGIN TRANSACTION;
;WITH s AS
(
SELECT TOP (100000) VisitorID, UserIpAddress FROM
(
SELECT
VisitorID,
UserIpAddress,
rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
FROM dbo.VisitorSession AS s
WHERE UserIpAddress IS NOT NULL
AND CreateDate > @dt
AND EXISTS
(
SELECT 1 FROM dbo.ShoppingCart AS c
WHERE c.VisitorID = s.VisitorID
AND (c.UserIpAddress <> s.UserIpAddress
OR c.UserIpAddress IS NULL)
)
) AS x
WHERE rn = 1
)
UPDATE c
SET c.UserIpAddress = s.UserIpAddress
FROM dbo.ShoppingCart AS c
INNER JOIN s
ON c.VisitorID = s.VisitorID;
SET @rc = @@ROWCOUNT;
COMMIT TRANSACTION;
ENDOf course, as the blog post illustrates, you can gain just about as much time by making sure your log is big enough to handle the entire transaction without having to grow - most of the delay is likely coming from many, many autogrow operations accommodating your large transaction. Sadly until you've done the operation once it can be very difficult to try and guess how much transaction log you'll need...
Code Snippets
;WITH s AS
(
SELECT VisitorID, UserIpAddress FROM
(
SELECT
VisitorID,
UserIpAddress,
rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
FROM dbo.VisitorSession
WHERE UserIpAddress IS NOT NULL
AND CreateDate > @dt
) AS x
WHERE rn = 1
)
UPDATE c
SET c.UserIpAddress = s.UserIpAddress
FROM dbo.ShoppingCart AS c
INNER JOIN s
ON c.VisitorID = s.VisitorID;DECLARE
@dt DATE = DATEADD(MONTH, -6, SYSDATETIME()),
@rc INT = 1;
WHILE @rc > 0
BEGIN
BEGIN TRANSACTION;
;WITH s AS
(
SELECT TOP (100000) VisitorID, UserIpAddress FROM
(
SELECT
VisitorID,
UserIpAddress,
rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
FROM dbo.VisitorSession AS s
WHERE UserIpAddress IS NOT NULL
AND CreateDate > @dt
AND EXISTS
(
SELECT 1 FROM dbo.ShoppingCart AS c
WHERE c.VisitorID = s.VisitorID
AND (c.UserIpAddress <> s.UserIpAddress
OR c.UserIpAddress IS NULL)
)
) AS x
WHERE rn = 1
)
UPDATE c
SET c.UserIpAddress = s.UserIpAddress
FROM dbo.ShoppingCart AS c
INNER JOIN s
ON c.VisitorID = s.VisitorID;
SET @rc = @@ROWCOUNT;
COMMIT TRANSACTION;
ENDContext
StackExchange Database Administrators Q#44067, answer score: 17
Revisions (0)
No revisions yet.