patternsqlModerate
In PostgreSQL is the row ordering preserved in functions and CTEs?
Viewed 0 times
postgresqlthepreservedorderingandfunctionsrowctes
Problem
In SELECT statements the ordering of the returned rows is not guaranteed if the ORDER BY clause is not specified. This is true for "normal" tables.
Is this also true for ordered tables generated using WITH expressions (CTEs)? Ordered tables returned by functions? I assume that not. Is this somewhere explicitly stated in the documentation?
Specifically, can I assume that this (more efficient) query:
will be equivalent to this query:
For functions:
If I have such function:
can I make such wrapper function and assume that the array will always be correctly sorted?
Is this also true for ordered tables generated using WITH expressions (CTEs)? Ordered tables returned by functions? I assume that not. Is this somewhere explicitly stated in the documentation?
Specifically, can I assume that this (more efficient) query:
WITH ordered AS ( SELECT * FROM table1 ORDER BY col1 )
SELECT sum(col2) result FROM
generate_series(0,50) nr,
LATERAL (SELECT * FROM ordered LIMIT 100 OFFSET nr*100) a
GROUP BY nr ORDER BY nr;will be equivalent to this query:
SELECT sum(col2) result FROM
generate_series(0,50) nr,
LATERAL (SELECT * FROM table1 ORDER BY col1 LIMIT 100 OFFSET nr*100) a
GROUP BY nr ORDER BY nr;For functions:
If I have such function:
CREATE FUNCTION do_sort(name text[]) RETURNS TABLE(name text) AS $
SELECT name.name FROM unnest($1) name ORDER BY name.name ASC;
$ LANGUAGE SQL IMMUTABLE;can I make such wrapper function and assume that the array will always be correctly sorted?
CREATE FUNCTION do_sort_returns_array(name text[]) RETURNS text[] AS $
SELECT array_agg(name) FROM do_sort($1);
$ LANGUAGE SQL IMMUTABLE;Solution
In
Now, having that as basis, we should consider that Postgres has implemented CTEs in a peculiar way. They are always materialized (see: PostgreSQL’s CTEs are optimisation fences).
That doesn't mean that you should rely on this. When using a CTE, like in the first query, an
Update, from Craig Ringer, after a comment I made in his above linked blog post:
It’s like quite a few other cases – right now PostgreSQL will always return the rows in the order they’re output by the CTE, but you shouldn’t technically rely on it. Who knows what future features could change that?
Unfortunately IIRC it’s also not smart enough to recognise that the rows are ordered correctly if you add another
So, if you want the results ordered in a specific ordering, add an
Another reason for explicitly using
Now, another way to write the query would be adding a
This way, the CTE will only be materialized for the 5100 rows you want and not for all (possibly millions) rows of the table. Even better if there is an index on
SELECT statements the ordering of the returned rows is not guaranteed if the ORDER BY clause is not specified. This is true for all tables, simple or complicated queries.Now, having that as basis, we should consider that Postgres has implemented CTEs in a peculiar way. They are always materialized (see: PostgreSQL’s CTEs are optimisation fences).
That doesn't mean that you should rely on this. When using a CTE, like in the first query, an
ORDER BY should be added as well when referencing the CTE, as the ORDER BY inside the CTE may be removed (I'm not saying that it will always be removed as my tests show that a sort is performed. But it may be removed in a future optimizer change, as an ORDER BY without LIMIT is redundant):WITH ordered AS ( SELECT * FROM table1 ORDER BY col1 ) -- redundant ORDER BY
SELECT sum(col2) result FROM
generate_series(0,50) nr,
LATERAL (SELECT * FROM ordered ORDER BY col1 -- ORDER BY added
LIMIT 100 OFFSET nr*100) a
GROUP BY nr ORDER BY nr;Update, from Craig Ringer, after a comment I made in his above linked blog post:
It’s like quite a few other cases – right now PostgreSQL will always return the rows in the order they’re output by the CTE, but you shouldn’t technically rely on it. Who knows what future features could change that?
Unfortunately IIRC it’s also not smart enough to recognise that the rows are ordered correctly if you add another
ORDER BY in the outer query. (Haven’t tested, but pretty sure). So it’s not necessarily free to do it the “right” way.So, if you want the results ordered in a specific ordering, add an
ORDER BY in the final SELECT. You could get away without it now (and you might get some slight performance gain) but there is no guarantee that this will not change in a future release.Another reason for explicitly using
ORDER BY is that you may not be always maintaining the code you write now. Another developer may try to reorganize the query without CTEs, using derived tables or LATERAL joins. They would need to know that your query relies in the order provided by the CTE so it needs to be either used in the final SELECT or commented or documented somewhere.Now, another way to write the query would be adding a
LIMIT in the CTE and using ROW_NUMBER():WITH ordered AS
( SELECT col2, (ROW_NUMBER() OVER (ORDER BY col1) - 1) / 100 AS nr
FROM table1
ORDER BY col1 LIMIT 5100 )
SELECT n.nr, sum(o.col2) AS result
FROM generate_series(0, 50) AS n (nr)
LEFT JOIN ordered AS o
ON n.nr = o.nr
GROUP BY n.nr
ORDER BY n.nr ;This way, the CTE will only be materialized for the 5100 rows you want and not for all (possibly millions) rows of the table. Even better if there is an index on
(col1, col2). You could even remove the generate_series() and/or remove the CTE:SELECT o.nr, sum(o.col2) AS result
FROM
( SELECT col2, (ROW_NUMBER() OVER (ORDER BY col1) - 1) / 100 AS nr
FROM table1
ORDER BY col1 LIMIT 5100
) AS o
GROUP BY o.nr
ORDER BY o.nr ;Code Snippets
WITH ordered AS ( SELECT * FROM table1 ORDER BY col1 ) -- redundant ORDER BY
SELECT sum(col2) result FROM
generate_series(0,50) nr,
LATERAL (SELECT * FROM ordered ORDER BY col1 -- ORDER BY added
LIMIT 100 OFFSET nr*100) a
GROUP BY nr ORDER BY nr;WITH ordered AS
( SELECT col2, (ROW_NUMBER() OVER (ORDER BY col1) - 1) / 100 AS nr
FROM table1
ORDER BY col1 LIMIT 5100 )
SELECT n.nr, sum(o.col2) AS result
FROM generate_series(0, 50) AS n (nr)
LEFT JOIN ordered AS o
ON n.nr = o.nr
GROUP BY n.nr
ORDER BY n.nr ;SELECT o.nr, sum(o.col2) AS result
FROM
( SELECT col2, (ROW_NUMBER() OVER (ORDER BY col1) - 1) / 100 AS nr
FROM table1
ORDER BY col1 LIMIT 5100
) AS o
GROUP BY o.nr
ORDER BY o.nr ;Context
StackExchange Database Administrators Q#130552, answer score: 12
Revisions (0)
No revisions yet.