HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Distinct Sort Operator with no Order, Distinct, Grouping

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctorderoperatorgroupingwithsort

Problem

I have two queries that are mostly the same bar a where expression. They are UNION ALL'd together for a complete result set. There is no grouping, nor ordering, nor distinct in the queries. However, directly prior to the final concatenation/select operator and after any joins sit two distinct sorts. I'm a bit confused by these distinct sorts given the aforementioned info.

Can anyone provide insight into why this operator is in this plan?

FWIW, I have rewritten this query, just trying to fill in knowledge gaps/correct misunderstandings.

Thanks

https://www.brentozar.com/pastetheplan/?id=Sy2RJrd7h

Solution

The query is written as two similar UNION ALL branches - with one of them being as follows.

SELECT EventKey,
       cast(ms.AdmitDateTime AS DATE) AS StagingAdmitDate,
       EventAdmitDate,
       ms.DischargeDateTime
FROM   #Events AS e
       JOIN CM.Staging AS ms
         ON ms.UniqueCaseIdentifier = e.UniqueCaseIdentifier
            AND ms.HospitalID = e.HospitalID
WHERE  EXISTS(SELECT 1
              FROM   #UpdatedStagingRecords AS i
              WHERE  ms.StagingKey = i.StagingKey)
       AND e.DischargeDateTime IS NULL


The execution plan for that branch is more along the following lines (*).

SELECT DISTINCT EventKey,
                cast(ms.AdmitDateTime AS DATE) AS StagingAdmitDate,
                EventAdmitDate,
                ms.DischargeDateTime
FROM   #UpdatedStagingRecords AS i
       INNER JOIN CM.Staging AS ms
               ON ms.StagingKey = i.StagingKey
       INNER JOIN #Events e
               ON ms.UniqueCaseIdentifier = e.UniqueCaseIdentifier
                  AND ms.HospitalID = e.HospitalID
                  AND e.DischargeDateTime IS NULL


Instead of driving the query by doing the join on #Events (741 rows) and CM.Staging (120907 rows) and then doing a semi join to see if there was a match in #UpdatedStagingRecords it drives the query from #UpdatedStagingRecords itself (2 rows) and does two index seeks in CM.Staging.

The join cannot just simply be changed to an inner join and left like that because duplicate rows for a StagingKey in #UpdatedStagingRecords could lead to duplicate rows in the result that would not be there with the original exists syntax.

So these are removed by the DISTINCT-ification step.

The same applies to the other branch of the UNION too but it is so similar I haven't addressed that specifically.

(*) - The actual duplicate removal by the DISTINCT is not something that can be easily expressed by representing with SQL syntax. It is not actually doing a DISTINCT based on the projected column values. It is doing it based on Bmknnnn column which maps to the physical row identifier from the #Events table.

Code Snippets

SELECT EventKey,
       cast(ms.AdmitDateTime AS DATE) AS StagingAdmitDate,
       EventAdmitDate,
       ms.DischargeDateTime
FROM   #Events AS e
       JOIN CM.Staging AS ms
         ON ms.UniqueCaseIdentifier = e.UniqueCaseIdentifier
            AND ms.HospitalID = e.HospitalID
WHERE  EXISTS(SELECT 1
              FROM   #UpdatedStagingRecords AS i
              WHERE  ms.StagingKey = i.StagingKey)
       AND e.DischargeDateTime IS NULL
SELECT DISTINCT EventKey,
                cast(ms.AdmitDateTime AS DATE) AS StagingAdmitDate,
                EventAdmitDate,
                ms.DischargeDateTime
FROM   #UpdatedStagingRecords AS i
       INNER JOIN CM.Staging AS ms
               ON ms.StagingKey = i.StagingKey
       INNER JOIN #Events e
               ON ms.UniqueCaseIdentifier = e.UniqueCaseIdentifier
                  AND ms.HospitalID = e.HospitalID
                  AND e.DischargeDateTime IS NULL

Context

StackExchange Database Administrators Q#326527, answer score: 4

Revisions (0)

No revisions yet.