HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query performance of index scans slower than parallel seq scan on Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
scanpostgresseqqueryslowerthanscansparallelperformanceindex

Problem

I've been trying to debug a performance problem on PostgreSQL 10 on AWS Aurora.

Specifically I'm trying to understand performance problems on "cold" queries where the buffer cache does not contain cached pages.

I have a table with an index and when the planner uses the index the query is substantially slower (spending more time on IO from what I can tell). However, when I drop an index the query is faster, it uses a parallel sequential scan.

The Schema of the table:

CREATE TABLE max_test_2018_3
(
   timestamp_         timestamp,
   person_alias       varchar(128),
   visitor_id         varchar(128),
   session_id         varchar(24),
   page_title         varchar(1024),
   location_host      varchar(256),
   location_path      varchar(1024),
   location_query     varchar(1024),
   location_fragment  varchar(1024),
   referrer_host      varchar(256),
   referrer_path      varchar(1024),
   referrer_query     varchar(1024),
   referrer_fragment  varchar(1024),
   duration           integer,
   location_secure    boolean,
   referrer_secure    boolean,
   id                 uuid,
   person_id          uuid,
   custom_properties  jsonb
);


The index DDL is:

CREATE INDEX max_test_2018_3_ix   ON max_test_2018_3
  USING btree  ( location_path varchar_pattern_ops);


And the query is:

explain (analyze, buffers)
SELECT O.person_id, 1 as step1, min(timestamp_) AS timestamp_ 
FROM max_test_2018_3 O
WHERE O.location_path  like '/tax-calculator%' 
GROUP BY O.person_id


Check the following two explains - note that both explains show minimal or no buffer hits. One shows the execution with the index and the other shows execution after the index has been dropped:

No index -> https://explain.depesz.com/s/N4uo
With index -> https://explain.depesz.com/s/69WS

The plan that includes the index scan takes 193 seconds vs 43 seconds for a sequential scan.

Why is step 2 in the "With-index" plan so slow? It reads in only 118372 buffers (about 118 MB of data), y

Solution

All your time is spent doing table I/O, which is to be expected in a test with a cold cache.

The index scan takes longer, but only because it is not parallelized. The sequential scan uses more total time, but since it is run on 5 cores in parallel, it finishes faster.

PostgreSQL v10 can perform parallel index scans, so the remaining riddle is why it doesn't. A side riddle is why the sequential scan uses so many parallel workers.

I suspect that you messed up the configuration parameters. The relevant parameters are:

-
max_parallel_workers_per_gather: limits how many workers can be used. You must have set this to 4 or more.

-
min_parallel_table_scan_size: if a table is bigger than that, a parallel worker is planned. If the table size exceeds 3n-1 times that value, n parallel workers are planned. So either your table is very big, or you reduced the parameter. Alternatively:

-
The storage parameter parallel_workers on the table overrides the calculation based on min_parallel_table_scan_size as described above, so maybe you set that.

-
Finally, min_parallel_index_scan_size governs when a parallel index scan is considered. Either the index is small, or you lowered the parameter.

I'd expect that the index scan will be faster once you set these parameters to good values.

Context

StackExchange Database Administrators Q#242918, answer score: 4

Revisions (0)

No revisions yet.