principlesqlMinor
Hash Join vs Hash Semi Join
Viewed 0 times
joinsemihash
Problem
PostgreSQL 9.2
I'm trying to understand the difference between
Here're two queries:
I
II
As can be seen, the only difference in the plans is that in the first case, the hastable consumes
But I don't understand the difference in the hashtable size. The
I'm trying to understand the difference between
Hash Semi Join and just Hash Join.Here're two queries:
I
EXPLAIN ANALYZE SELECT * FROM orders WHERE customerid IN (SELECT
customerid FROM customers WHERE state='MD');
Hash Semi Join (cost=740.34..994.61 rows=249 width=30) (actual time=2.684..4.520 rows=120 loops=1)
Hash Cond: (orders.customerid = customers.customerid)
-> Seq Scan on orders (cost=0.00..220.00 rows=12000 width=30) (actual time=0.004..0.743 rows=12000 loops=1)
-> Hash (cost=738.00..738.00 rows=187 width=4) (actual time=2.664..2.664 rows=187 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 7kB
-> Seq Scan on customers (cost=0.00..738.00 rows=187 width=4) (actual time=0.018..2.638 rows=187 loops=1)
Filter: ((state)::text = 'MD'::text)
Rows Removed by Filter: 19813II
EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON o.customerid = c.customerid WHERE c.state = 'MD'
Hash Join (cost=740.34..1006.46 rows=112 width=298) (actual time=2.831..4.762 rows=120 loops=1)
Hash Cond: (o.customerid = c.customerid)
-> Seq Scan on orders o (cost=0.00..220.00 rows=12000 width=30) (actual time=0.004..0.768 rows=12000 loops=1)
-> Hash (cost=738.00..738.00 rows=187 width=268) (actual time=2.807..2.807 rows=187 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 37kB
-> Seq Scan on customers c (cost=0.00..738.00 rows=187 width=268) (actual time=0.018..2.777 rows=187 loops=1)
Filter: ((state)::text = 'MD'::text)
Rows Removed by Filter: 19813As can be seen, the only difference in the plans is that in the first case, the hastable consumes
7kB, but in the second 37kB and that the node is Hash Semi Join.But I don't understand the difference in the hashtable size. The
Hash node uses perfectly the same Seq Scan node having the same Filter. Why is there the difference?Solution
In the first query, only the customer_id needs to be saved from the
In the second query, all of the columns need to be stored into the hash table, because you are selecting all of the columns from the table (using
customers into the hash table, because that is the only data needed to implement the semi-join.In the second query, all of the columns need to be stored into the hash table, because you are selecting all of the columns from the table (using
*) rather than just testing for existence of the customer_id.Context
StackExchange Database Administrators Q#123264, answer score: 7
Revisions (0)
No revisions yet.