patternsqlMinor
Use array expression from subquery in ANY() condition
Viewed 0 times
expressionfromconditionarrayanysubqueryuse
Problem
I have a relatively complicated query with a subquery fetching an array like so:
and would like to use the array 'cats' in a later WHERE condition like
But this does not work as it states that the column 'cats' doesn't exist.
c/p'ing the subquery into the
...
ARRAY(SELECT category_id FROM category_schedule_con con
WHERE s.id = con.schedule_id ORDER BY category_id) AS cats,
...and would like to use the array 'cats' in a later WHERE condition like
...
WHERE 4 = ANY(cats)
...But this does not work as it states that the column 'cats' doesn't exist.
c/p'ing the subquery into the
ANY clause yields the expected result.Solution
Explanation
By definition in the SQL standard (which Postgres implements) you can reference output columns in
An output column's name can be used to refer to the column's value in
there you must write out the expression instead.
Related:
Obviously, your subquery is a correlated subquery expression in the
To avoid repeating lengthy / expensive expressions in the
Chances are, your query can be much more efficient ...
Better query
Applying the above, this query would work:
Now you can reference the column alias
Using
You can apply an
More importantly, pull the predicate (the
Depending on data distribution, a
About
But it should be the fastest to invert the logic: start by finding
Index
Be sure to have a multicolumn index like:
For the last query, we'd need the inverted sequence of columns:
and another one with two columns switched back again. Two indexes on just
A PK or UNIQUE constraint on both columns serves as well.
By definition in the SQL standard (which Postgres implements) you can reference output columns in
ORDER BY or GROUP BY, but not in the WHERE or HAVING clause. The manual:An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;there you must write out the expression instead.
Related:
- PostgreSQL reusing computation result in select query
- PostgreSQL Where count condition
- GROUP BY + CASE statement
Obviously, your subquery is a correlated subquery expression in the
SELECT list (which is hidden in the question due to over-simplification).To avoid repeating lengthy / expensive expressions in the
WHERE clause you can use a subquery in the FROM list. A correlated subquery in the SELECT list cannot be referenced by alias in the WHERE clause, that's just an output column like any other.Chances are, your query can be much more efficient ...
Better query
Applying the above, this query would work:
SELECT s.*, con.cats
FROM some_table s -- guessing the missing query
JOIN (
SELECT schedule_id
, array_agg(category_id ORDER BY category_id) AS cats
FROM category_schedule_con
GROUP BY 1
) con ON con.schedule_id = s.id
WHERE 3 = ANY(cats);Now you can reference the column alias
cats in the WHERE clause. But this query is terribly inefficient for big tables for multiple reasons. Most importantly, the predicate is not sargable.Using
GROUP BY and the aggregate function array_agg() instead of the ARRAY constructor because we are producing multiple arrays in a single query.You can apply an
ORDER BY clause to almost any aggregate function, but a sorted subquery typically performs better:SELECT s.*, con.cats
FROM some_table s
JOIN (
SELECT id, array_agg(category_id) AS cats
FROM (
SELECT schedule_id AS id, category_id -- alias id for convenience
FROM category_schedule_con
ORDER BY 1, 2 -- to get ordered list per schedule_id
) con
GROUP BY 1
) con USING (id)
WHERE 3 = ANY(cats);More importantly, pull the predicate (the
WHERE condition) down into the subquery to make it possible to use an index and exclude irrelevant rows early. Much faster with big tables:SELECT s.*, con.cats
FROM some_table s
JOIN (
SELECT id, array_agg(category_id) AS cats
FROM (
SELECT schedule_id AS id, category_id
FROM category_schedule_con c
WHERE EXISTS (
SELECT 1 FROM category_schedule_con
WHERE schedule_id = c.schedule_id
AND category_id = 3
)
ORDER BY 1, 2
) con
GROUP BY 1
) con USING (id);Depending on data distribution, a
LATERAL join (requires Postgres 9.3+) may be more efficient:SELECT s.*, con.cats
FROM some_table s
, LATERAL (
SELECT ARRAY (
SELECT category_id
FROM category_schedule_con
WHERE schedule_id = s.id
ORDER BY 1
) AS cats
) con
WHERE EXISTS (
SELECT 1 FROM category_schedule_con
WHERE schedule_id = s.id
AND category_id = 3
);About
LATERAL:- What is the difference between LATERAL and a subquery in PostgreSQL?
But it should be the fastest to invert the logic: start by finding
schedule_id that have category_id = 3, self-join to category_schedule_con and aggregate before joining to the other table:SELECT s.*, con.cats
FROM (
SELECT id, array_agg(c.category_id) AS cats
FROM (
SELECT schedule_id AS id
FROM category_schedule_con
WHERE category_id = 3
) x
JOIN category_schedule_con c USING (id)
GROUP BY id
) con
JOIN some_table s USING (id);Index
Be sure to have a multicolumn index like:
CREATE INDEX category_schedule_con_foo_idx ON category_schedule_con
(schedule_id, category_id);For the last query, we'd need the inverted sequence of columns:
CREATE INDEX category_schedule_con_bar_idx ON category_schedule_con
(category_id, schedule_id);and another one with two columns switched back again. Two indexes on just
(category_id) and (schedule_id) would work fast, too;A PK or UNIQUE constraint on both columns serves as well.
Code Snippets
SELECT s.*, con.cats
FROM some_table s -- guessing the missing query
JOIN (
SELECT schedule_id
, array_agg(category_id ORDER BY category_id) AS cats
FROM category_schedule_con
GROUP BY 1
) con ON con.schedule_id = s.id
WHERE 3 = ANY(cats);SELECT s.*, con.cats
FROM some_table s
JOIN (
SELECT id, array_agg(category_id) AS cats
FROM (
SELECT schedule_id AS id, category_id -- alias id for convenience
FROM category_schedule_con
ORDER BY 1, 2 -- to get ordered list per schedule_id
) con
GROUP BY 1
) con USING (id)
WHERE 3 = ANY(cats);SELECT s.*, con.cats
FROM some_table s
JOIN (
SELECT id, array_agg(category_id) AS cats
FROM (
SELECT schedule_id AS id, category_id
FROM category_schedule_con c
WHERE EXISTS (
SELECT 1 FROM category_schedule_con
WHERE schedule_id = c.schedule_id
AND category_id = 3
)
ORDER BY 1, 2
) con
GROUP BY 1
) con USING (id);SELECT s.*, con.cats
FROM some_table s
, LATERAL (
SELECT ARRAY (
SELECT category_id
FROM category_schedule_con
WHERE schedule_id = s.id
ORDER BY 1
) AS cats
) con
WHERE EXISTS (
SELECT 1 FROM category_schedule_con
WHERE schedule_id = s.id
AND category_id = 3
);SELECT s.*, con.cats
FROM (
SELECT id, array_agg(c.category_id) AS cats
FROM (
SELECT schedule_id AS id
FROM category_schedule_con
WHERE category_id = 3
) x
JOIN category_schedule_con c USING (id)
GROUP BY id
) con
JOIN some_table s USING (id);Context
StackExchange Database Administrators Q#110743, answer score: 6
Revisions (0)
No revisions yet.