HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Postgres JOIN strange behaviour

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
joinbehaviourpostgresstrange

Problem

I'm just starting studying Postgres and I'm in a situation where, depending how I do the JOIN on my tables, the performance and plan output seems really strange.

Those are the table used with their indexes:

create table escola
(
    pk_codigo integer not null
        constraint pk_escola
            primary key,
    nome varchar(100),
    municipio varchar(150),
    uf char(2),
    cod_municipio integer,
    uf_id integer default 0 not null
        constraint fk_escola_uf_id
            references tb_uf
)
;

create index idx_escola_uf
    on escola (uf)
;

create index idx_escola_uf_id
    on escola (uf_id)
;

create index idx_multi_escola_uf_pk
    on escola (uf, pk_codigo)
;

create table if not exists candidato
(
    pk_numero_inscricao bigint not null
        constraint candidato_pk
            primary key,
    cod_municipio_residencia integer,
    municipio_residencia varchar(150),
    uf_residencia char(2),
    uf_nascimento char(2),
    situacao_conclusao numeric(1),
    ano_concluiu smallint,
    idade smallint,
    sexo char,
    fk_codigo_escola integer
        constraint fk_candidato_codigo_escola
            references escola,
    uf_prova char(2)
)
;

create index if not exists idx_candidato_codigo_escola
    on candidato (fk_codigo_escola)
;

create table tb_uf
(
    uf varchar(2),
    pk_id serial not null
        constraint tb_uf_pkey
            primary key
)
;

create unique index tb_uf_uf_uindex
    on tb_uf (uf)
;

create unique index tb_uf_pk_id_uindex
    on tb_uf (pk_id)
;


And the queries (with plans):

EXPLAIN ANALYZE
SELECT pk_numero_inscricao, pk_codigo
FROM escola e
  JOIN candidato c
    ON c.fk_codigo_escola = e.pk_codigo
WHERE e.uf = 'RJ'
;


Time without EXPLAIN ANALYZE: 916ms
Plan: https://explain.depesz.com/s/M6B

EXPLAIN ANALYZE
SELECT pk_numero_inscricao, pk_codigo
FROM escola AS e
  JOIN candidato AS c
    ON c.fk_codigo_escola = e.pk_codigo
  JOIN tb_uf AS u
    ON e.uf_id = u.pk_id
WHERE u.uf = 'RJ'
;

Solution

The answer is pretty "searchable" when you notice, that 1st and 3rd query fetch 1.2M rows from candidato ...just to exclude 90% of them from the results.

2nd query returns 1 row from tb_uf, which forces Nested Loop plan.

This means that planner has wrong assumptions about statistics (expected results count) or costs (of random seek). You could either tune these values:

https://www.postgresql.org/message-id/20060926193553.GA27268@oppetid.no

Have Postgresql query planner use nested loop w/ indices over hash join

or manually force a Nested Loop. Although my intuition tells me it should be better to have Hash Join here.

Not having the data, I'd suggest to try one of:

  • set enable_mergejoin = off



  • using CTE to force order of operations, like (3rd query with minimal modifications for easier understanding):



WITH e AS (SELECT * FROM escola WHERE uf_id = 19)
 SELECT pk_numero_inscricao, pk_codigo
 FROM e
   JOIN candidato AS c
     ON c.fk_codigo_escola = e.pk_codigo

Code Snippets

WITH e AS (SELECT * FROM escola WHERE uf_id = 19)
 SELECT pk_numero_inscricao, pk_codigo
 FROM e
   JOIN candidato AS c
     ON c.fk_codigo_escola = e.pk_codigo

Context

StackExchange Database Administrators Q#207225, answer score: 4

Revisions (0)

No revisions yet.