HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

Merge join asks for seemingly unnecessary sort before joining

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
unnecessarybeforemergejoinseeminglyforsortasksjoining

Problem

I use one table to update another table. Both the tables have a clustered index where the first 4 columns are the same type of data.

CREATE UNIQUE CLUSTERED INDEX [i109139_I_109139CLI_44C9176D64C84ECB959C0EAA6AE7A4F4] ON [dbo].[t109139_44C9176D64C84ECB959C0EAA6AE7A4F4]
(
    [PARTITION] ASC,
    [DATAAREAID] ASC,
    [ITEMID] ASC,
    [INVENTSERIALID] ASC,
    [RECID] ASC
)
GO

ALTER TABLE [dbo].[INVENTSERIAL] ADD  CONSTRAINT [I_1204ITEMSERIALIDX] PRIMARY KEY CLUSTERED 
(
    [PARTITION] ASC,
    [DATAAREAID] ASC,
    [ITEMID] ASC,
    [INVENTSERIALID] ASC
)


The following is the query I use to update the tables

UPDATE T1
    SET NOTFORPROJECTS = T2.LANNOTFORPROJECTS
FROM tempdb."DBO".t109139_44C9176D64C84ECB959C0EAA6AE7A4F4 T1
CROSS JOIN LANINVENTSERIALVIEW T2
WHERE T1.PARTITION = 5637144576
  AND T1.DATAAREAID = N'lan'
  AND T2.PARTITION = 5637144576
  AND T2.DATAAREAID = N'lan'
  AND T2.INVENTSERIALID = T1.INVENTSERIALID
  AND T2.ITEMID = T1.ITEMID
  AND T2.DATAAREAID = T1.DATAAREAID
  AND T2.PARTITION = T1.PARTITION


When I look at the queryplan, I notice that it does use the clustered indexes of both the tables, but it performs a hash match instead of a merge join. I find this unexpected as the used data should both be sorted on the equality operators.

So then I try to force a Merge join, and it seems that SQL Server still uses both the clustered indexes, but first sorts the data from the two tables from ItemId,InventSerialId to InventSerialId,ItemId and then uses that to perform a merge join. So it basically swaps the two columns.

Do note that in the table InventSerial, only 1 InventSerialId is an empty string, and in the other table that I update, there will be many empty-string (not null) InventSerialIds. I have no idea if this is relevant.

Query plan:

https://www.brentozar.com/pastetheplan/?id=H1L4j7EF3

Solution

I can repro this too (Fiddle).

If you change

AND T2.INVENTSERIALID = T1.INVENTSERIALID
  AND T2.ITEMID = T1.ITEMID


To

AND T2.ITEMID = T1.ITEMID  
  AND T2.INVENTSERIALID = T1.INVENTSERIALID


It is able to produce the desired plan (no sort operators in the final query in the Fiddle).

Somewhat unsatisfying that you have to do this but previously with MERGE join and MERGE union I have also found that tweaks are needed to get the hoped for plan (example).

Paul White previously commented on the above example

Finding optimal ordering is NP-hard so database engines rely on
heuristics.

Code Snippets

AND T2.INVENTSERIALID = T1.INVENTSERIALID
  AND T2.ITEMID = T1.ITEMID
AND T2.ITEMID = T1.ITEMID  
  AND T2.INVENTSERIALID = T1.INVENTSERIALID

Context

StackExchange Database Administrators Q#328999, answer score: 4

Revisions (0)

No revisions yet.