patternsqlCritical
Are WHERE clauses applied in the order they are written?
Viewed 0 times
theorderwrittenarewhereappliedtheyclauses
Problem
I'm trying to optimize a query which looks into a big table (37 millions rows) and have a question about what order the operations are executed in a query.
Are the
Is it a good way to put the most restrictive clauses first to avoid big loops for other clauses, in order to do a faster execution?
Now queries takes so much time to execute.
select 1
from workdays day
where day.date_day >= '2014-10-01'
and day.date_day 'A'
and province.id_region in ('10' ,'15' ,'21' ,'26' ,'31' , ...,'557')
and province.id_cr in ('9' ,'14' ,'20' ,'25' ,'30' ,'35' ,'37')
)Are the
WHERE clauses for the date range executed before the subquery?Is it a good way to put the most restrictive clauses first to avoid big loops for other clauses, in order to do a faster execution?
Now queries takes so much time to execute.
Solution
To elaborate on @alci's answer:
PostgreSQL doesn't care what order you write things in
-
PostgreSQL doesn't care at all about the order of entries in a
-
The order in which joins are written is also ignored up to the configured
-
Subqueries can be executed before or after the query that contains them, depending on what's fastest, so long as the subquery is executed before the outer query actually needs the information. Often in reality the subquery gets executed kind-of in the middle, or interleaved with the outer query.
-
There's no guarantee PostgreSQL will actually execute parts of the query at all. They can be completely optimized away. This is important if you call functions with side-effects.
PostgreSQL will transform your query
PostgreSQL will heavily transform queries while retaining the exact same effects, in order to make them run faster while not changing the results.
-
Terms outside a subquery can get pushed down into the subquery so they execute as part of the subquery not where you wrote them in the outer query
-
Terms in the subquery can be pulled up to the outer query so their execution is done as part of the outer query, not where you wrote them in the subquery
-
The subquery can, and often is, flattened into a join on the outer table. The same is true of things like
-
Views get flattened into the query that uses the view
-
SQL functions often get inlined into the calling query
-
... and there are numerous other transformations made to queries, such as constant expression pre-evaluation, de-correlation of some subqueries, and all sorts of other planner/optimizer tricks.
In general PostgreSQL can massively transform and rewrite your query, to the point where each of these queries:
will usually all produce exactly the same query plan. (Assuming I didn't make any dumb mistakes in the above anyway).
It's not uncommon to try to optimize a query only to find that the query planner has already figured out the tricks you're trying and applied them automatically, so the hand-optimized version is no better than the original.
Limitations
The planner/optimizer is far from omnicient, and is limited by the requirement to be absolutely certain it can't change the effects of the query, the available data to make decisions with, the rules that've been implemented, and the CPU time it can afford to spend pondering the optimizations. For example:
-
The planner relies on statistics kept by
-
The statistics are only a sample, so they can be misleading due to sampling effects, especially if too small a sample is taken. Bad plan choices can result.
-
The statistics don't keep track of some kinds of data about the table, like correlations between columns. This can lead the planner to make bad decisions when it assumes things are independent when they aren't.
-
The planner relies on cost parameters like
-
Any subquery with a
-
CTE terms (the clauses in a
-
PostgreSQL has a limited ability to optimize across queries on foreign tables,
-
PostgreSQL won't inline a function written in anything except plain SQL, nor do pullup/pushdown between it and the outer query.
-
The planner/optimizer is really dumb about selecting expression indexes, and about trivial data type differences between index and expression.
Tons more, too.
Your query
In the case of your query:
```
select 1
from workdays day
where day.date_day >= '2014-10-01'
and day.date_day 'A'
and province.id_region in ('10' ,'15' ,'21
PostgreSQL doesn't care what order you write things in
-
PostgreSQL doesn't care at all about the order of entries in a
WHERE clause, and chooses indexes and execution order based on cost and selectivity estimation alone.-
The order in which joins are written is also ignored up to the configured
join_collapse_limit; if there are more joins than that, it'll execute them in the order they're written.-
Subqueries can be executed before or after the query that contains them, depending on what's fastest, so long as the subquery is executed before the outer query actually needs the information. Often in reality the subquery gets executed kind-of in the middle, or interleaved with the outer query.
-
There's no guarantee PostgreSQL will actually execute parts of the query at all. They can be completely optimized away. This is important if you call functions with side-effects.
PostgreSQL will transform your query
PostgreSQL will heavily transform queries while retaining the exact same effects, in order to make them run faster while not changing the results.
-
Terms outside a subquery can get pushed down into the subquery so they execute as part of the subquery not where you wrote them in the outer query
-
Terms in the subquery can be pulled up to the outer query so their execution is done as part of the outer query, not where you wrote them in the subquery
-
The subquery can, and often is, flattened into a join on the outer table. The same is true of things like
EXISTS and NOT EXISTS queries.-
Views get flattened into the query that uses the view
-
SQL functions often get inlined into the calling query
-
... and there are numerous other transformations made to queries, such as constant expression pre-evaluation, de-correlation of some subqueries, and all sorts of other planner/optimizer tricks.
In general PostgreSQL can massively transform and rewrite your query, to the point where each of these queries:
select my_table.*
from my_table
left join other_table on (my_table.id = other_table.my_table_id)
where other_table.id is null;
select *
from my_table
where not exists (
select 1
from other_table
where other_table.my_table_id = my_table.id
);
select *
from my_table
where my_table.id not in (
select my_table_id
from other_table
where my_table_id is not null
);will usually all produce exactly the same query plan. (Assuming I didn't make any dumb mistakes in the above anyway).
It's not uncommon to try to optimize a query only to find that the query planner has already figured out the tricks you're trying and applied them automatically, so the hand-optimized version is no better than the original.
Limitations
The planner/optimizer is far from omnicient, and is limited by the requirement to be absolutely certain it can't change the effects of the query, the available data to make decisions with, the rules that've been implemented, and the CPU time it can afford to spend pondering the optimizations. For example:
-
The planner relies on statistics kept by
ANALYZE (usually via autovacuum). If these are outdated, the plan choice can be bad.-
The statistics are only a sample, so they can be misleading due to sampling effects, especially if too small a sample is taken. Bad plan choices can result.
-
The statistics don't keep track of some kinds of data about the table, like correlations between columns. This can lead the planner to make bad decisions when it assumes things are independent when they aren't.
-
The planner relies on cost parameters like
random_page_cost to tell it the relative speed of various operations on the particular system it's installed on. These are only guides. If they're badly wrong they can lead to poor plan choices.-
Any subquery with a
LIMIT or OFFSET cannot be flattened or be subject to pullup / pushdown. This doesn't mean it'll execute before all parts of the outer query, though, or even that it'll execute at all.-
CTE terms (the clauses in a
WITH query) are always executed in their entirety, if they're executed at all. They can't be flattened, and terms can't be pushed up or pulled down across the CTE term barrier. CTE terms are always executed before the final query. This is non-SQL-standard behaviour, but it's documented as how PostgreSQL does things.-
PostgreSQL has a limited ability to optimize across queries on foreign tables,
security_barrier views, and certain other special kinds of relation-
PostgreSQL won't inline a function written in anything except plain SQL, nor do pullup/pushdown between it and the outer query.
-
The planner/optimizer is really dumb about selecting expression indexes, and about trivial data type differences between index and expression.
Tons more, too.
Your query
In the case of your query:
```
select 1
from workdays day
where day.date_day >= '2014-10-01'
and day.date_day 'A'
and province.id_region in ('10' ,'15' ,'21
Code Snippets
select my_table.*
from my_table
left join other_table on (my_table.id = other_table.my_table_id)
where other_table.id is null;
select *
from my_table
where not exists (
select 1
from other_table
where other_table.my_table_id = my_table.id
);
select *
from my_table
where my_table.id not in (
select my_table_id
from other_table
where my_table_id is not null
);select 1
from workdays day
where day.date_day >= '2014-10-01'
and day.date_day <= '2015-09-30'
and day.offer_id in (
select offer.offer_day
from offer
inner join province on offer.id_province = province.id_province
inner join center cr on cr.id_cr = province.id_cr
where upper(offer.code_status) <> 'A'
and province.id_region in ('10' ,'15' ,'21' ,'26' ,'31' , ...,'557')
and province.id_cr in ('9' ,'14' ,'20' ,'25' ,'30' ,'35' ,'37')
)select 1
from workdays day
inner join offer on day.offer_id = offer.offer_day
inner join province on offer.id_province = province.id_province
inner join center cr on cr.id_cr = province.id_cr
where upper(offer.code_status) <> 'A'
and province.id_region in ('10' ,'15' ,'21' ,'26' ,'31' , ...,'557')
and province.id_cr in ('9' ,'14' ,'20' ,'25' ,'30' ,'35' ,'37')
and day.date_day >= '2014-10-01'
and day.date_day <= '2015-09-30';Context
StackExchange Database Administrators Q#115723, answer score: 101
Revisions (0)
No revisions yet.