HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

What's the order of operations in PostgreSQL?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqltheoperationsorderwhat

Problem

I'm a DB student, and I executed the following query to learn a couple things at the same time (LEFT/RIGHT JOIN, UNION, WHERE + RegEx).
What's troubling me is the order of execution. I have two tables, as such:

create table practicaleft(
    id smallint primary key,
    nombre varchar,
    cumple date
);

create table practicaright(
    id smallint primary key,
    apellido varchar,
    cumpleanios date
);


Then, I insert some random data:

INSERT INTO practicaleft VALUES
(1, 'John', CURRENT_DATE - 1),
(5, 'Alice', CURRENT_DATE - 5),
(3, 'Bob', CURRENT_DATE - 3),
(7, 'Eva', CURRENT_DATE - 7);

INSERT INTO practicaright VALUES
(5, 'Doe', CURRENT_DATE - 5),
(6, 'Smith', CURRENT_DATE - 6),
(3, 'Johnson', CURRENT_DATE - 3),
(4, 'Brown', CURRENT_DATE - 4);


Afterwards, I execute this query:

select id, nombre
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright 
where cumpleanios > current_date - 5;


The results? Here you go:

4   "Brown"
5   "Alice"
5   "Doe"
3   "Johnson"
3   "Johnson"
4   "Brown"


TL;DR: this query is divided in three parts, and results are merged with the operator UNION ALL.

Now comes the question. One might believe this is executed instruction by instruction, and so, the order should be:

5   "Alice"
5   "Doe"
3   "Johnson"
4   "Brown"
3   "Johnson"
4   "Brown"


But that isn't happening. The only way to fix that is if I add some random string as a field, like so:

select id, nombre, 'part1' as query_part
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido, 'part2' as query_part
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido, 'part3' as query_part
from practicaright 
where cumpleanios > current_date - 25;


What is happening? Did I skip over some truly important SQL mechanic?

Solution

Jasen's answer is correct - PostgreSQL is free to return the rows in any order it pleases unless you add an ORDER BY clause:

(SELECT ... UNION SELECT ... UNION ALL SELECT ...) ORDER BY ...;


But let me explain why PostgreSQL doesn't return the rows in the order you expect. The reason is that the first UNION is not UNION ALL. If you had used UNION ALL everywhere, PostgreSQL would execute the query like this:
EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union all
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;

QUERY PLAN
══════════════════════════════════════════════════════════════════
Append
-> Seq Scan on practicaleft
Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
-> Seq Scan on practicaright pr
Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
-> Seq Scan on practicaright
Filter: (cumpleanios > (CURRENT_DATE - 5))
(7 rows)


That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected.

But you used union the first time, and union eliminates duplicates. This is executed as follows:
EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;

QUERY PLAN
══════════════════════════════════════════════════════════════════════════════
Append
-> HashAggregate
Group Key: practicaleft.id, practicaleft.nombre
-> Append
-> Seq Scan on practicaleft
Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
-> Seq Scan on practicaright pr
Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
-> Seq Scan on practicaright
Filter: (cumpleanios > (CURRENT_DATE - 5))
(10 rows)


PostgreSQL uses a hash aggregate to remove duplicates from the first two branches. The result rows are returned in the order they happen to have in the hash table, which is pretty random (good hash functions behave like that).

Code Snippets

(SELECT ... UNION SELECT ... UNION ALL SELECT ...) ORDER BY ...;

Context

StackExchange Database Administrators Q#333106, answer score: 22

Revisions (0)

No revisions yet.