HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is a hash match operator in this very basic query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thiswhyoperatorquerymatchhashverybasic

Problem

I'm beginning to learn some about looking at execution plans and making queries more efficient

Consider these two basic queries

select distinct pat_id, drug_class, drug_name from rx 

select pat_id, drug_class, drug_name from rx


and their execution plans

index being used:

CREATE CLUSTERED INDEX [ix_overlap] ON [dbo].[rx] 
(
    [pat_id] ASC,
    [fill_date] ASC,
    [script_end_date] ASC,
    [drug_name] ASC
)


Even though the first query supposedly has the higher cost by a 4:1 margin it runs faster than the second one. Why is it that a simple distinct added to the query will add the (what I assume to always be bad, corrections are welcome) hash match operator? And why does it have the higher query cost relative to the second query if it runs faster.

Solution

The first query is using a parallel plan, meaning the "work" was split into multiple tasks carried out by multiple threads. The cumulative CPU time was therefore higher than for the serial plan used for your second query.

As to why the distinct causes the hash match operator to appear in the plan; an aggregate or sort operation is required to determine the DISTINCT result. @SQL_Kiwi might pop up with a more indepth explanation shortly but the hash match operator is apparently favoured for larger result sets.


The optimizer tends to prefer the Hash Match Aggregate on larger
rowsets, with fewer groups, where there is no reason to produce a
sorted output, and where the incoming rows are not sorted on the
DISTINCT expression(s). Larger inputs favour hash matching because
the algorithm generally scales well (although it does require a memory
grant) and can make good use of parallelism. Fewer groups are better
for hashing because it means fewer entries in the hash table, and the
memory needed to store unique values is proportional to the number of
groups (and the size of the group). Hash matching does not require or
preserve the order of the incoming row stream. [Source]

Context

StackExchange Database Administrators Q#38961, answer score: 7

Revisions (0)

No revisions yet.