patternsqlModerate
Why is SELECT * much faster than selecting all columns (in a different column order) by name?
Viewed 0 times
whyordermuchcolumnsallcolumnthanfasterdifferentname
Problem
On a table with columns a,b,c,d,e,f,g,h,i,j,k I get:
And:
Is there a way to make the select by column as fast?
Update:
No index on a table, newly created one
Here is the EXPLAIN ANALYZE, does not seem too helpful:
And:
select * from misty order by a limit 25;
Time: 302.068 msAnd:
select c,b,j,k,a,d,i,g,f,e,h from misty order by a limit 25;
Time: 1258.451 msIs there a way to make the select by column as fast?
Update:
No index on a table, newly created one
Here is the EXPLAIN ANALYZE, does not seem too helpful:
explain analyze select * from misty order by a limit 25;
Limit (cost=43994.40..43994.46 rows=25 width=190) (actual time=404.958..404.971 rows=25 loops=1)
-> Sort (cost=43994.40..45731.11 rows=694686 width=190) (actual time=404.957..404.963 rows=25 loops=1)
Sort Key: a
Sort Method: top-N heapsort Memory: 28kB
-> Seq Scan on misty (cost=0.00..24390.86 rows=694686 width=190) (actual time=0.013..170.945 rows=694686 loops=1)
Total runtime: 405.019 ms
(6 rows)And:
explain analyze select c,b,j,k,a,d,i,g,f,e,h from misty order by a limit 25;
Limit (cost=43994.40..43994.46 rows=25 width=190) (actual time=1371.735..1371.745 rows=25 loops=1)
-> Sort (cost=43994.40..45731.11 rows=694686 width=190) (actual time=1371.733..1371.736 rows=25 loops=1)
Sort Key: a
Sort Method: top-N heapsort Memory: 28kB
-> Seq Scan on misty (cost=0.00..24390.86 rows=694686 width=190) (actual time=0.015..516.355 rows=694686 loops=1)
Total runtime: 1371.797 ms
(6 rows)Solution
This was posted to pgsql-hackers mailing list and I tried to answer in brief there. It seems if the target list (specified columns) matches the tuple descriptor of the relation exactly, that is, both in number of columns and order, then the underlying scan can return a tuple that's directly consumable by the enclosing Sort node. On the other hand, if the target list does not match (either in order or the number of specified columns), the scan returns a form of the tuples that requires Sort's data preparation step to perform extra work (convert from an internal tuple format to the format directly consumable by the sorting code).
By the way, '*' is internally transformed into a list that (intuitively) matches the relation's tuple descriptor.
EDIT:
If you look at your latter EXPLAIN ANALYZE's Seq Scan's actual times, you can see it's more than former's. That happened because the scan performed an additional step of projection (that is, converting heap tuple into an internal values[], nulls[] format). And because that happened, the upper Sort node had to do extra work in its data initialization, that of converting it back to the tuple format that the actual sorting step understands. That's evident from the Sort's startup cost. That doesn't happen in the former case. That is, both the scan returns the tuple as it is and sort's initialization step simply copies it.
By the way, '*' is internally transformed into a list that (intuitively) matches the relation's tuple descriptor.
EDIT:
If you look at your latter EXPLAIN ANALYZE's Seq Scan's actual times, you can see it's more than former's. That happened because the scan performed an additional step of projection (that is, converting heap tuple into an internal values[], nulls[] format). And because that happened, the upper Sort node had to do extra work in its data initialization, that of converting it back to the tuple format that the actual sorting step understands. That's evident from the Sort's startup cost. That doesn't happen in the former case. That is, both the scan returns the tuple as it is and sort's initialization step simply copies it.
Context
StackExchange Database Administrators Q#102403, answer score: 12
Revisions (0)
No revisions yet.