patternsqlMinor
Union implemented with Hash Match operator
Viewed 0 times
operatorwithunionimplementedmatchhash
Problem
I was reviewing the SQL Server physical operators listed on TechNet (don't judge, you know you've done it) and read that the Hash Match physical operator is sometimes used to implement the
I have never seen that done and would like to learn more. An example query would be great. When is it used and when is it better than the alternatives? (Those are usually the same, but not always.)
UNION logical operator.I have never seen that done and would like to learn more. An example query would be great. When is it used and when is it better than the alternatives? (Those are usually the same, but not always.)
Solution
I cannot recall seeing a hash match (union) operator in the wild so I can't speak authoritatively as to when they are better than the alternatives. It's possible to force one using the
For the union operator, use the first input to build the hash table (removing duplicates). Use the second input (which must have no duplicates) to probe the hash table, returning all rows that have no matches, then scan the hash table and return all entries.
So how can we create a query with a hash match (union) operator as the option with the lowest cost? Well, hash join tends to scale much better with parallelism than merge join so a query running in parallel could help tip the scales towards hash match. We need the second input to have no duplicates so a unique constraint on a table could help, but unique constraints are implemented as indexes so that will help a merge join as well. Perhaps giving the hash table lots of duplicates will favor the hash match over the concatenation option because we'll be doing a smaller effective sort?
After some trial and error, one approach that works on my machine is to insert one million rows with 10000 distinct values into one table and one million distinct values into another table. Sample code:
The following query has a hash match (union) operator with a total estimated cost of 12.3812 units:
Adding a
I did a few test runs and usually the hash match (union) operator query was just barely faster than the others. Here are results from the most recent run:
{ CONCAT | HASH | MERGE } UNION query hint but let's try to create a real example. Quote from the documentation referenced in the question:For the union operator, use the first input to build the hash table (removing duplicates). Use the second input (which must have no duplicates) to probe the hash table, returning all rows that have no matches, then scan the hash table and return all entries.
So how can we create a query with a hash match (union) operator as the option with the lowest cost? Well, hash join tends to scale much better with parallelism than merge join so a query running in parallel could help tip the scales towards hash match. We need the second input to have no duplicates so a unique constraint on a table could help, but unique constraints are implemented as indexes so that will help a merge join as well. Perhaps giving the hash table lots of duplicates will favor the hash match over the concatenation option because we'll be doing a smaller effective sort?
After some trial and error, one approach that works on my machine is to insert one million rows with 10000 distinct values into one table and one million distinct values into another table. Sample code:
CREATE TABLE X_NUM_SMALL (ID INT NOT NULL);
GO
INSERT INTO X_NUM_SMALL WITH (TABLOCK)
SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
GO 100
CREATE TABLE X_NUM_1000000_UQ (ID INT NOT NULL);
INSERT INTO X_NUM_1000000_UQ WITH (TABLOCK)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
ALTER TABLE X_NUM_1000000_UQ
ADD CONSTRAINT UC_X_NUM_1000000 UNIQUE (ID);
SET STATISTICS IO, TIME ON;The following query has a hash match (union) operator with a total estimated cost of 12.3812 units:
SELECT *
FROM X_NUM_SMALL
UNION
SELECT *
FROM X_NUM_1000000_UQ
OPTION (MAXDOP 4);Adding a
MERGE UNION hint to the query increases the cost just barely to 12.6551 optimizer units. Exchanging that hint for a CONCAT UNION hint increases the cost further to 17.2215 optimizer units.I did a few test runs and usually the hash match (union) operator query was just barely faster than the others. Here are results from the most recent run:
╔════════════╦══════════╦══════════════╗
║ UNION TYPE ║ CPU TIME ║ ELAPSED TIME ║
╠════════════╬══════════╬══════════════╣
║ HASH ║ 657 ║ 2279 ║
║ MERGE ║ 312 ║ 2375 ║
║ CONCAT ║ 906 ║ 2459 ║
╚════════════╩══════════╩══════════════╝Code Snippets
CREATE TABLE X_NUM_SMALL (ID INT NOT NULL);
GO
INSERT INTO X_NUM_SMALL WITH (TABLOCK)
SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
GO 100
CREATE TABLE X_NUM_1000000_UQ (ID INT NOT NULL);
INSERT INTO X_NUM_1000000_UQ WITH (TABLOCK)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
ALTER TABLE X_NUM_1000000_UQ
ADD CONSTRAINT UC_X_NUM_1000000 UNIQUE (ID);
SET STATISTICS IO, TIME ON;SELECT *
FROM X_NUM_SMALL
UNION
SELECT *
FROM X_NUM_1000000_UQ
OPTION (MAXDOP 4);╔════════════╦══════════╦══════════════╗
║ UNION TYPE ║ CPU TIME ║ ELAPSED TIME ║
╠════════════╬══════════╬══════════════╣
║ HASH ║ 657 ║ 2279 ║
║ MERGE ║ 312 ║ 2375 ║
║ CONCAT ║ 906 ║ 2459 ║
╚════════════╩══════════╩══════════════╝Context
StackExchange Database Administrators Q#167086, answer score: 7
Revisions (0)
No revisions yet.