HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why the WHERE clause is not pushed down in the view's query?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whythepushedwherequeryviewdownnotclause

Problem

With Postgres 9.4, I'm doing the following query quite often:

SELECT DISTINCT ON(recipient) * FROM messages
LEFT JOIN identities ON messages.recipient = identities.name
WHERE timestamp BETWEEN timeA AND timeB
ORDER BY recipient, timestamp DESC;


So I decided to create a view:

CREATE VIEW myView AS SELECT DISTINCT ON(recipient) * FROM messages
LEFT JOIN identities ON messages.recipient = identities.name
ORDER BY recipient, timestamp DESC;


I just realized if I query my view like SELECT * FROM myView WHERE timestamp BETWEEN timeA AND timeB I get a significantly worse performance.

Doing EXPLAIN ANALYZE on both queries, I found out the reason is that the database in the second case brings up all the records, does the left join and then applies the WHERE clause. In other words, the WHERE clause is not pushed down into the view's query. I also tried to remove the ORDER BY from the view, but still the database performs the LEFT JOIN on full data rather on the filtered set.

What is the reason of this behavior? Is there a way I can get a comparable performance when using view?

Solution

This query:

SELECT DISTINCT ON(recipient) * FROM messages
LEFT JOIN identities ON messages.recipient = identities.name
WHERE messages.timestamp BETWEEN timeA AND timeB
ORDER BY recipient, timestamp DESC;


says:

For all messages between timeA and timeB, find the recipients and for every recipient, find one message (the latest in between timeA and timeB).

This query (which you would get if you use the view):

SELECT *
FROM 
  ( SELECT DISTINCT ON(recipient) * FROM messages
    LEFT JOIN identities ON messages.recipient = identities.name
    ORDER BY recipient, timestamp DESC
  ) AS myView
WHERE timestamp BETWEEN timeA AND timeB;


says:

For all messages, find the recipients and for every recipient, find one message, (the latest in all times) and then, show only those messages that are between timeA and timeB.

As a result, the first query will show messages that are between time A and
B which the second query will not show (because there may be one or more messages for the same recipient, later than time B).

So, the queries are logically different and the condition cannot (and shouldn't) be pushed down for your view.

If you want to have parameters passed to your view, look the two answers (by @a_horse_with_no_name and @Erwin Brandstetter) in this question for how to use a set returning function: Pass In “WHERE” parameters to PostgreSQL View?

Code Snippets

SELECT DISTINCT ON(recipient) * FROM messages
LEFT JOIN identities ON messages.recipient = identities.name
WHERE messages.timestamp BETWEEN timeA AND timeB
ORDER BY recipient, timestamp DESC;
SELECT *
FROM 
  ( SELECT DISTINCT ON(recipient) * FROM messages
    LEFT JOIN identities ON messages.recipient = identities.name
    ORDER BY recipient, timestamp DESC
  ) AS myView
WHERE timestamp BETWEEN timeA AND timeB;

Context

StackExchange Database Administrators Q#110460, answer score: 13

Revisions (0)

No revisions yet.