principlesqlMajor
Undesirable Nest Loop vs. Hash Join in PostgreSQL 9.6
Viewed 0 times
postgresqlundesirablenestloopjoinhash
Problem
I've a trouble with PostgreSQL 9.6 query planning. My query looks like this:
I have Row-Level Security enabled for the above-used tables.
-
with
Nested Loop joining with a total running time of about 37 seconds: https://explain.depesz.com/s/59BR
-
with
I did
I know that it is not a good practice to
Rewriting the query is an option.
If I run the same query under a role that bypasses RLS, then it is executed very fast. The row-level security policy looks like this:
Any ideas or suggestions would be greatly appreciated.
SET role plain_user;
SELECT properties.*
FROM properties
JOIN entries_properties
ON properties.id = entries_properties.property_id
JOIN structures
ON structures.id = entries_properties.entry_id
WHERE structures."STRUKTURBERICHT" != ''
AND properties."COMPOSITION" LIKE 'Mo%'
AND (
properties."NAME" LIKE '%VASP-ase-preopt%'
OR properties."CALCULATOR_ID" IN (7,22,25)
)
AND properties."TYPE_ID" IN (6)I have Row-Level Security enabled for the above-used tables.
-
with
set enable_nestloop = True, the query planner runsNested Loop joining with a total running time of about 37 seconds: https://explain.depesz.com/s/59BR
-
with
set enable_nestloop = False, the Hash Join method is used and the query time is about 0.3 sec: https://explain.depesz.com/s/PG8EI did
VACUUM ANALYZE before running the queries, but it didn't help.I know that it is not a good practice to
set enable_nestloop = False, and any other similar options for the planner. But how could I "convince" the planner to use hash joins without disabling nested loops?Rewriting the query is an option.
If I run the same query under a role that bypasses RLS, then it is executed very fast. The row-level security policy looks like this:
CREATE POLICY properties_select
ON properties
FOR SELECT
USING (
(
properties.ouid = get_current_user_id()
AND properties.ur
)
OR (
properties.ogid in (select get_current_groups_id())
AND properties.gr
)
OR properties.ar
);Any ideas or suggestions would be greatly appreciated.
Solution
What's happening here is the Nested Loop is way off on one side. Nested Loops work really well when one side is very small, such as returning one row. In your query, the planner fumbles here and estimates that a Hash Join will return just one row. Instead, that Hash Join (property_id = id) returns 1,338 rows. This forces 1,338 loops to run on the other side of the Nested Loop which already has 3,444 rows. That's a hella-lot when you're only expecting one (which isn't even much of a "loop"). Anywayy..
Further examination as we move down shows that the Hash Join is really borked by the estimations arising from this,
PostgreSQL expects that to return one row. But it doesn't. And, that's really your problem. So some options here, that don't involve taking out a sledge hammer and disabling
-
You can add an index or two to
-
Alternatively, you can move the properties stuff to a CTE or subselect with
Further examination as we move down shows that the Hash Join is really borked by the estimations arising from this,
Filter: (((properties."COMPOSITION")::text ~~ 'Mo%'::text) AND (((properties."NAME")::text ~~ '%VASP-ase-preopt%'::text) OR (properties."CALCULATOR_ID" = ANY ('{7,22,25}'::integer[]))))PostgreSQL expects that to return one row. But it doesn't. And, that's really your problem. So some options here, that don't involve taking out a sledge hammer and disabling
nested_loop-
You can add an index or two to
properties to help it potentially skip the seq scan entirely, or better estimate the return.CREATE INDEX ON properties USING ( "TYPE_ID", "CALCULATOR_ID" );
-- the gist_trgm_ops may or may not be needed depending on selectivity of above.
CREATE INDEX ON properties USING GIST (
"COMPOSITION" gist_trgm_ops,
"NAME" gist_trgm_ops
);
ANALYZE properties;-
Alternatively, you can move the properties stuff to a CTE or subselect with
OFFSET 0 which creates a fence.WITH t AS (
SELECT *
FROM properties.
WHERE "COMPOSITION" LIKE 'Mo%'
AND (
"NAME" LIKE '%VASP-ase-preopt%'
OR "CALCULATOR_ID" IN (7,22,25)
)
AND "TYPE_ID" IN (6)
)
SELECT * FROM structures
JOIN t ON (
structures.id = entries_properties.entry_id
)Code Snippets
Filter: (((properties."COMPOSITION")::text ~~ 'Mo%'::text) AND (((properties."NAME")::text ~~ '%VASP-ase-preopt%'::text) OR (properties."CALCULATOR_ID" = ANY ('{7,22,25}'::integer[]))))CREATE INDEX ON properties USING ( "TYPE_ID", "CALCULATOR_ID" );
-- the gist_trgm_ops may or may not be needed depending on selectivity of above.
CREATE INDEX ON properties USING GIST (
"COMPOSITION" gist_trgm_ops,
"NAME" gist_trgm_ops
);
ANALYZE properties;WITH t AS (
SELECT *
FROM properties.
WHERE "COMPOSITION" LIKE 'Mo%'
AND (
"NAME" LIKE '%VASP-ase-preopt%'
OR "CALCULATOR_ID" IN (7,22,25)
)
AND "TYPE_ID" IN (6)
)
SELECT * FROM structures
JOIN t ON (
structures.id = entries_properties.entry_id
)Context
StackExchange Database Administrators Q#181674, answer score: 28
Revisions (0)
No revisions yet.