HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Very simple read-only SELECT not run in parallel in postrgresql

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
simplereadpostrgresqlparallelveryselectnotonlyrun

Problem

I am trying to do a simple SELECT on a large table, but it is not being run in parallel:

my_db=> explain (analyze, buffers) select value from my_table;
Seq Scan on my_table (cost=0.00..94433357.34 rows=5441810434 width=4) (actual time=0.029..1519364.787 rows=4897721407 loops=1)
   Buffers: shared hit=1126394 read=38888859 dirtied=1926396 written=284
 Planning time: 2.277 ms
 Execution time: 2468388.489 ms


If I do a max() operation it runs in parallel:

my_db=> explain (analyze, buffers) select max(value) from my_table;
 Finalize Aggregate  (cost=48552456.41..48552456.42 rows=1 width=4) (actual time=1347796.993..1347796.993 rows=1 loops=1)
   Buffers: shared hit=1173785 read=38870105 dirtied=30813
   ->  Gather  (cost=48552455.58..48552456.39 rows=8 width=4) (actual time=1347796.953..1347796.969 rows=9 loops=1)
         Workers Planned: 8
         Workers Launched: 8
         Buffers: shared hit=1173785 read=38870105 dirtied=30813
         ->  Partial Aggregate  (cost=48551455.58..48551455.59 rows=1 width=4) (actual time=1347790.862..1347790.862 rows=1 loops=9)
               Buffers: shared hit=1172674 read=38870104 dirtied=30813
               ->  Parallel Seq Scan on my_table (cost=0.00..46849720.06 rows=680694206 width=4) (actual time=0.213..1048428.195 rows=544206420 loops=9)
                     Buffers: shared hit=1172674 read=38870104 dirtied=30813
 Planning time: 0.761 ms
 Execution time: 1347985.127 ms


Why is the SELECT not done in parallel?

The table has several billion rows. The DBA has done some indexing - I don't know what that really means, but I have read that index scans are not parallel in PostgreSQL 9.6.8, but it is doing a sequential scan anyway.

I am using PostgreSQL 9.6.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

The table has structure:

```
my_db=> \d my_table
id | bigint | not null default nextval('my_table_id_seq'::regclass)
datetime | timestam

Solution

The query leader is the only process that communicates with the front end. All the SELECTed data for the non-aggregated version of the query has to be marshalled up from the parallel workers to the query leader. That is expensive, so using parallel query here would almost certainly be counter-productive. You might be able to get one anyway by setting parallel_tuple_cost and parallel_setup_cost to zero.

When doing the count(*) aggregate, each parallel worker only needs to send up one piece of information, a partial count, to the leader. This is more obviously worthwhile.

Context

StackExchange Database Administrators Q#210205, answer score: 3

Revisions (0)

No revisions yet.