patternsqlMinor
PostgreSQL: Materialize as inner node of Merge Join in query plan
Viewed 0 times
postgresqlmergenodequeryjoinmaterializeplaninner
Problem
I'm learning PostgreSQL EXPLAIN plan nodes. Currently, I investigate Materialize node.
Here is a query I found in the blog post (https://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/) and a plan I obtained by myself (structurally it is the same as in the blog post):
I can't understand the argumentation of the blog author about why Postgres is using Materialize here.
... Merge Join has to match several criteria. Some are obvious (data has to be sorted) and some are not so obvious as are more technical (data has to be scrollable back and forth).
Because of this (these not so obvious criteria) sometimes Pg will have to Materialize the data coming from source (Index Scan in our case) so
Here is a query I found in the blog post (https://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/) and a plan I obtained by myself (structurally it is the same as in the blog post):
set work_mem= '1GB';
explain analyze select * from
(select * from pg_class order by oid) as c
join
(select * from pg_attribute a order by attrelid) as a
on c.oid = a.attrelid;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=34.27..333.37 rows=2913 width=504) (actual time=0.823..28.413 rows=2913 loops=1)
Merge Cond: (pg_class.oid = a.attrelid)
-> Sort (cost=33.99..34.97 rows=395 width=265) (actual time=0.739..1.084 rows=395 loops=1)
Sort Key: pg_class.oid
Sort Method: quicksort Memory: 130kB
-> Seq Scan on pg_class (cost=0.00..16.95 rows=395 width=265) (actual time=0.038..0.285 rows=395 loops=1)
-> Materialize (cost=0.28..257.05 rows=2913 width=239) (actual time=0.060..11.702 rows=2913 loops=1)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.28..220.63 rows=2913 width=239) (actual time=0.050..6.827 rows=2913 loops=1)
Planning Time: 1.472 ms
Execution Time: 29.617 ms
(10 rows)I can't understand the argumentation of the blog author about why Postgres is using Materialize here.
... Merge Join has to match several criteria. Some are obvious (data has to be sorted) and some are not so obvious as are more technical (data has to be scrollable back and forth).
Because of this (these not so obvious criteria) sometimes Pg will have to Materialize the data coming from source (Index Scan in our case) so
Solution
Why Merge Join needs data to be scrollable back and forth? According to my understanding of Merge Join it iterates forward both data sets simultaneously with two pointers. There is no case in Merge Join algorithm when it goes backward.
So the first node produces a 'cat'. The 2nd node scans (ignoring results) until it finds a 'cat' or greater, produces results until it sees a >'cat', then pauses. Now the first node produce another 'cat'. Now, what do you think the 2nd node should do?
In my case the inner input set is produced by Index Scan, so according to the book, there should be no Materialize node in this plan.
What if it just thinks using the materialize will be faster?(Well, that is not the case here as Laurenz pointed out)
So the first node produces a 'cat'. The 2nd node scans (ignoring results) until it finds a 'cat' or greater, produces results until it sees a >'cat', then pauses. Now the first node produce another 'cat'. Now, what do you think the 2nd node should do?
In my case the inner input set is produced by Index Scan, so according to the book, there should be no Materialize node in this plan.
What if it just thinks using the materialize will be faster?(Well, that is not the case here as Laurenz pointed out)
Context
StackExchange Database Administrators Q#302165, answer score: 4
Revisions (0)
No revisions yet.