HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query with merge join horrendously slow

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
mergewithqueryjoinslowhorrendously

Problem

I'm using a Postgres database and trying to optimize the following query:

```
SELECT DISTINCT catalogite1_.id
FROM cat_catalogitem catalogite1_
INNER JOIN cat_service service2_
ON catalogite1_.service_id=service2_.id
LEFT OUTER JOIN cat_entitlement_services entitledse6_
ON service2_.id=entitledse6_.service_id
LEFT OUTER JOIN cat_entitlement entitlemen7_
ON entitledse6_.entitlement_id=entitlemen7_.id
AND (entitlemen7_.id IN ('505e03e5-7370-42c2-a26e-bdb2df593934' ,
'508da3b6-7147-4b16-971f-6e6476b8ef44' ,
'6c68fbd2-7cc4-4b7c-85c1-617b69578ab9' ,
'6c9e5ff0-a073-4923-a5ec-b47f5e4c120a' ,
'961bee54-e9d6-402c-a763-c3937b03402f' ,
'2f113c9a-9e2f-47d8-beda-df0e05faa167' ,
'471bca1e-a112-4842-bdfc-252b8848b862' ,
'482ba515-2197-4fdb-a74b-37d9a0795c4e' ,
'872038e4-766a-4b93-bf95-aa2735e7f942' ,
'fd6345fc-8799-42a0-83d5-0234e450e397' ,
'7378e830-5271-482f-b73b-7ce4232b000d' ,
'6aeafe3b-aac9-4c67-8895-aa77da7a7d6b'))

LEFT OUTER JOIN cat_subtenant catalogsub3_
ON catalogite1_.subtenant_id=catalogsub3_.id
LEFT OUTER JOIN cat_entitlement_catalogitems entitledca4_
ON catalogite1_.id=entitledca4_.catalogitem_id
LEFT OUTER JOIN cat_entitlement entitlemen5_
ON entitledca4_.entitlement_id=entitlemen5_.id
AND (entitlemen5_.id IN ('505e03e5-7370-42c2-a26e-bdb2df593934' ,
'508da3b6-7147-4b16-971f-6e6476b8ef44' ,
'6c68fbd2-7cc4-4b7c-85c1-617b69578ab9' ,
'6c9e5ff0-a073-4923-a5ec-b47f5e4c120a' ,
'961bee54-e9d6-402c-a763-c3937b03402f' ,
'2f113c9a-9e2f-47d8-beda-df0e05faa167' ,

Solution

I'm not sure why this sorting is happening because there is no ORDER BY operation in my query.

It is sorting so that it can do the merge join. Merge joins require sorted input.


The sort seems to be an external disk sort, which could be why it's proving to be so costly.

No, the actual sorting shouldn't take much time at all (although you might want to increase work_mem anyway, that kind of sort probably doesn't need to be on disk. What is the current setting?). Once it has the sorted data, though, it has to re-probe that data again and again as part of the merge join. That is where the time is going, and some of that time gets attributed to the sort step. Also, with this kind of plan, the overhead of collecting the times to report for an EXPLAIN ANALYZE can be huge, leading to the query taking several times longer than if it were not being monitored. If you do EXPLAIN (ANALYZE, TIMING OFF), what do you get for the bottom line execution time?

If you were to get it to use a hash join instead of a merge join, it probably would not change anything because the re-probing would still have to happen, just through a different mechanism.

The likely problem is that the query is executing as two sub-branches, one off from catalogite1_ and one off from service2_, which are then effectively cartesian joined at the end. The filtering can't be done until the end because some of the data needed for the comparisons is coming from one branch, and some from the other. And it is effectively a cartesian join because service2_ only has one qualifying row in it, meaning catalogite1_.service_id=service2_.id is not very selective

I would try changing this part of the query:

ON              service2_.id=entitledse6_.service_id


to this:

ON              catalogite1_.service_id=entitledse6_.service_id


This might allow the filtering to occur at a much lower place in the query. If this works, then it would be interesting to know why the planner didn't make this switch for you--it should be capable of it. What is your setting for join_collapse_limit?

Also, things like this:

AND             ( 
                                   service2_.id=NULL 
                   OR              COALESCE(NULL) IS NULL)


Certainly don't help the planner make reasonable choices!

Code Snippets

ON              service2_.id=entitledse6_.service_id
ON              catalogite1_.service_id=entitledse6_.service_id
AND             ( 
                                   service2_.id=NULL 
                   OR              COALESCE(NULL) IS NULL)

Context

StackExchange Database Administrators Q#151226, answer score: 4

Revisions (0)

No revisions yet.