HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Interpreting startup time and varying plans for Postgres seq scans

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresvaryingseqinterpretingtimescansforandstartupplans

Problem

In asking a recent question, some mysterious startup time components came up in my EXPLAIN ANALYZE output. I was playing further, and discovered that the startup time drops to near 0 if I remove the regex WHERE clause.

I ran the following bash script as a test:

for i in $(seq 1 10)
do
    if (( $RANDOM % 2 == 0 ))
    then
        echo "Doing plain count"
        psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2"
    else
        echo "Doing regex count"
        psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'"
    fi
done


The first query returns a count of ~30 million, and the second counts only 7 rows. They are running on a PG 12.3 read replica in RDS with minimal other activity. Both versions take roughly the same amount of time, as I'd expect. Here is some output filtered with grep:

Doing plain count
               ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.086..38622.215 rows=10114306 loops=3)
Doing regex count
               ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3091955.34 rows=897 width=0) (actual time=16856.679..41398.062 rows=2 loops=3)
Doing plain count
               ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.162..39454.499 rows=10114306 loops=3)
Doing plain count
               ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.036..39213.171 rows=10114306 loops=3)
Doing regex count
               ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3091955.34 rows=897 width=0) (actual time=12711.308..40015.734 rows=2 loops=3)
Doing plain count
               ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.244..39277.683 rows=10114306 loops=3)
Doing regex count
^CCancel request sent


So, a few questions:

-
What goes into this startup component of "actual time" in

Solution

-
The secret is in the Rows Removed by Filter: 10115028:

It takes the sequential scan 17 seconds to find and return the first result row.

-
The optimizer has no idea how long it takes until the first row passes the filter. Since it doesn't make any difference for the quality of the plan, it just sets the startup cost to 0.

-
Both plans work the same: each of the three processes scans a third of the table and counts the rows (the Partial Aggregate that returns 1 row). In the first case, 10 million rows are counted by each worker.

These three numbers are gathered and summed in the Finalize Aggregate stage.

Context

StackExchange Database Administrators Q#280466, answer score: 3

Revisions (0)

No revisions yet.