HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Nested Loop with Materialize resulting in very slow query on postgresql 9.5

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlwithqueryloopslowmaterializeresultingnestedvery

Problem

I have an analytics database on Postgres 9.5
I'm taking advantage of constraint partitioning with table inheritance to split up events into monthly tables. The base (parent) table contains no rows.

I have a relatively simple query but the planner is coming up with a completely insane plan where it materializes within a nested loop as can be seen from the explain.

The query

select count(person_id) as thecount from (
    select distinct A.person_id from event_page as O 
    join alias as A on (O.person_alias = A.alias) 
    where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' 
    and O.location_host = 'www.foo.com' AND O.location_path= '/ca/sale' 
) as alias_3494697


The explain: https://explain.depesz.com/s/IoO

My theory is that table inheritance with constraint partitioning is tripping up the planner somehow. If I replace the base table with the monthly table for April I get a sane plan with good performance:

https://explain.depesz.com/s/GpbU

NOTE The slight difference from the above query - i'm using the child table instead of the base table (ie no inheritance at play)

select count(person_id) as thecount from (
    select distinct A.person_id from event_page_2017_4 as O 
    join alias as A on (O.person_alias = A.alias) 
    where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' 
    and O.location_host = 'www.foo.com' AND O.location_path= '/ca/sale' 
) as alias_3494697


All stats are up to date via vacuum analyze. The relevant schema is below:

```
CREATE TABLE event_page (
id CHAR(24) PRIMARY KEY,
timestamp_ TIMESTAMP NOT NULL,
person_alias VARCHAR(128) NOT NULL,
visitor_id VARCHAR(128) NOT NULL,
session_id VARCHAR(24) NOT NULL,
ip_address VARCHAR(64) ,
user_agent VARCHAR(256) ,
operating_system VARCHAR(64) ,
device_type VARCHAR(64) ,
browse

Solution

A couple of observations:

1.

The query plan for the slow query shows:


Seq Scan on event_page o (cost=10,000,000,000.00..10,000,000,000.00 rows=1 width=274)

... which indicates that you ran with SET enable_seqscan = off;. And that means your Postgres version positively could not find any other way than the sequential scan.

2.

Did you enable constraint_exclusion like the manual advises here:



  • Ensure that the constraint_exclusion configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized


as desired.


3.

The day boundaries in your query are not in sync with the partitioning. (But that has no immediate effect for the given query, see comments.)

The CHECK constraint reads:

CHECK ( timestamp_ >= '2017-4-01'::timestamp AND timestamp_ < '2017-05-01'::timestamp )


There may be confusion with timestamp vs. timestamptz and / or time zones. Or something got lost in translation and the question is misleading in that respect.

Either way, the expression in your WHERE clause opens up a sneaky corner case (even if your day boundaries were in sync):

where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' 


Postgres timestamps are implemented as 8-byte integers allowing 6 decimal places. A row with '2017-4-29 03:59:59.9995' would not behave as expected.
Details:

  • Ignoring timezones altogether in Rails and PostgreSQL



Plus, this does not play well with constraint exclusion. More partitions than necessary may have to be read.

Use instead:

WHERE o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND   o.timestamp_ <  '2017-4-29 04:00' -- exclude upper bound


You may want get "days" in your partitions in sync with your queries - 00:00 vs. 04:00 (or the evil 03:59:59.999) - and maybe use timestamptz to begin with.

4.

I see for the index-only scan on alias:


Heap Fetches: 1918543

... which is exceptionally high. There may or may not be problems with the visibility map. (See also: Updating The Visibility Map) Did you run VACUUM FULL on the table(s)? (Compare this thread on plpgsql-performance.) Try plain VACUUM, which updates the visibility map properly. But that's all hard to say, since you seem to have been running with enable_seqscan = off. Not sure how this plays out in combination with all the other stuff at work here.

There have been various improvements to VACUUM for Postgres 9.6 (among other things), this may explain the difference - or what you observed in pg 9.6 is coincidence and due to other factors.

5.

If alias.alias is unique (and referential integrity can be assumed), the query can be simplified. Just count(DISTINCT o.person_alias) without even joining to table alias at all. Like:

SELECT count(DISTINCT o.person_alias) AS thecount
FROM   event_page o
WHERE  o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND    o.timestamp_ <  '2017-4-29 04:00' -- exclude upper bound
AND    o.location_host = 'www.foo.com'
AND    o.location_path = '/ca/sale';


(Or use the subquery like you had it, may be faster than count(DISTINCT ...).)

If alias.alias is not unique, your query may be wrong / ambiguous.

Why do you have person_alias in table event_page and not person_id to begin with?

6.

You use the data type CHAR(24) as PK for your tables, which is almost certainly a bad choice. Consider a serial or bigserial column or maybe a uuid. Numeric types are smaller and faster and not burdened by collation rules, varying byte length, etc. Related:

-
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

-
What is the overhead for varchar(n)?

Code Snippets

CHECK ( timestamp_ >= '2017-4-01'::timestamp AND timestamp_ < '2017-05-01'::timestamp )
where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999'
WHERE o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND   o.timestamp_ <  '2017-4-29 04:00' -- exclude upper bound
SELECT count(DISTINCT o.person_alias) AS thecount
FROM   event_page o
WHERE  o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND    o.timestamp_ <  '2017-4-29 04:00' -- exclude upper bound
AND    o.location_host = 'www.foo.com'
AND    o.location_path = '/ca/sale';

Context

StackExchange Database Administrators Q#172286, answer score: 4

Revisions (0)

No revisions yet.