HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Are results from UNION ALL clauses always appended in order?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
appendedorderallareunionalwaysresultsfromclauses

Problem

As per standard SQL UNION / UNION ALL do not guarantee any particular sort order without an outer ORDER BY clause - like there is hardly any place in SQL where sort order is guaranteed without ORDER BY.

However, Postgres uses an "Append" step for plain cases of UNION ALL, hence results from the first leg (even if unsorted among their partition) always come before the next leg, etc. Postgres simply appends results from each leg in the given order. This is particularly relevant with a LIMIT clause:

SELECT 1 FROM tbl  -- or any complex query
UNION ALL
SELECT 2
LIMIT  1


Obviously this does not hold for UNION (without ALL). But other than that, I have never seen Postgres return out of order, i.e. '2' from the above query, while the first SELECT would also return row(s). Not even if the first leg is extremely expensive.

I have based queries on this behavior in the past. Now I have met a claim Postgres might return rows out of order here, but unsubstantiated with actual proof.

The current Postgres manual has this to say on the matter:

UNION effectively appends the result of query2 to the result
of query1 (although there is no guarantee that this is the
order in which the rows are actually returned). Furthermore, it
eliminates duplicate rows from its result, in the same way as
DISTINCT, unless UNION ALL is used.

This is pretty unclear. Does the cited order apply to the list of SELECT clauses, or rows within each clause, or just the returned set? Also, UNION ALL is only mentioned in the second sentence, so it remains unclear if the all-important first sentence is supposed to apply to UNION ALL ...

Can anybody show an example, where rows are returned out of order, breaking the sequence of UNION ALL clauses? In any version of Postgres. (Even if the latest version would be most interesting.)

If not that, is there reason to believe this might change in a future version?

ORDER BY is not the immediate issue here. The question

Solution

There was a similar question recently in pgsql-docs mailing list,

Clarify the ordering guarantees in combining queries (or lack thereof):

I was trying to understand what - if any - are the guarantees with regards
to ordering for combining queries (UNION/UNION ALL/...). From this
message1, it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2] makes no mention of this, I'd suggest adding a note
clarifying this.

Tom Lane (and others) replied:

Since the documentation doesn't make a guarantee there is none. If you
want ordered output use ORDER BY.

No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes or
partitioning or a parallel plan, etc.
In any case, there is no guarantee that the behaviour will not change in
the future due to planner improvements.

Yeah, that. You can get a parallelized plan today for UNION ALL:

=# explain analyze select * from foo union all select * from foo;
                                                                 QUERY PLAN                                                            

--------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
         ->  Parallel Seq Scan on foo  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3)
         ->  Parallel Seq Scan on foo foo_1  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
 Planning Time: 0.094 ms
 Execution Time: 488.352 ms


It's true that in simple non-parallelized cases we'll do the first query then the second, but SQL doesn't promise that to be true and neither does Postgres.

Code Snippets

=# explain analyze select * from foo union all select * from foo;
                                                                 QUERY PLAN                                                            

--------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
         ->  Parallel Seq Scan on foo  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3)
         ->  Parallel Seq Scan on foo foo_1  (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
 Planning Time: 0.094 ms
 Execution Time: 488.352 ms

Context

StackExchange Database Administrators Q#316818, answer score: 15

Revisions (0)

No revisions yet.