patternsqlMinor

Group By primary key or DISTINCT increase query time over 1000x with limit

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

distinctprimarygroupwithlimitquerytime1000xincreaseover

Problem

Also see https://stackoverflow.com/questions/17741167/hibernate-jpa-improve-performance-of-distinct-query but I realized this is mainly a PostgreSQL issue.

My application uses a 3rd party extension to PostgreSQL to for searching chemical structures. This is in general slow. I can not change the SQL directly as the application uses hibernate and native query is not an option.

I have a many-to-many relationship and the "Link-table" has an additional column. Basically I have a Mixture that consists of elements and an element occurs in the mixture at a certain percentage. favorite

I use Spring-Data JPA with QueryDSL, hibernate and PostgreSQL. I have a query with 2 Joins It's a many too many with a link-table that has additional columns. Bascially I have a Mixture that consists of elements and an element occurs in the mixture at a certain percentage.

I'm now searching all Mixtures that contain an element matching the given criteria. Because a mixture can have multiple elements that match the criteria, the query may return the same entity multiple times. I want to prevent that hence DISTINCT or GROUP BY primary key.

The query is also paged meaning it uses limit and offset. The query runs perfectly fine without either distinct or group by but then I can get duplicate rows. If I add either group by or distinct query is over 1000 times slower.

Query with DISTINCT (note SQL from hibernate):

select distinct
    simplecomp0_.chem_compound_id as chem1_0_, 
    --snipped about 10 more columns all short varchar or date fields
    from simple_compound simplecomp0_ 
    inner join compound_composition compositio1_ 
              on simplecomp0_.chem_compound_id=compositio1_.chem_compound_id 
    inner join chemical_structure chemicalst2_ 
              on compositio1_.chemical_structure_id=chemicalst2_.structure_id 
where 
    chemicalst2_.structure_id  @ ('CCNc1ccccc1', '')::bingo.sub
limit 5 
offset 5

EXPLAIN ANALYZE with DISTINCT:

```
"Limit (cost=5984.58..5984

Solution

First off, LIMIT / OFFSET without ORDER BY are of limited usefulness, since the order is arbitrary and can change any time (when VACUUM runs or when the table is manipulated in at any way). It is only somewhat reliable with read-only tables. That's fine if you don't care which rows you get back, but it may break paging.

You may be able to solve your conundrum with the good old EXISTS. PostgreSQL can stop searching for more hits as soon as the first is found - as opposed to your query with DISTINCT, where it tries to collect all matches.

It's hard to be more specific without knowing the table structure, cardinalities, index definitions and what's behind your peculiar WHERE expression. But this might just do it:

SELECT s.chem_compound_id AS chem1_0_ 
       -- 10 more columns all short varchar or date fields
FROM   simple_compound s
WHERE  EXISTS (
   SELECT 1
   FROM   compound_composition sc
   JOIN   chemical_structure   c  ON sc.chemical_structure_id = c.structure_id 
   WHERE  c.structure_id  @ ('CCNc1ccccc1', '')::bingo.sub
   AND    sc.chem_compound_id = s.chem_compound_id
   )
LIMIT  5 
OFFSET 5

This also assumes you are only interested in columns from simple_compound in the output.

Code Snippets

SELECT s.chem_compound_id AS chem1_0_ 
       -- 10 more columns all short varchar or date fields
FROM   simple_compound s
WHERE  EXISTS (
   SELECT 1
   FROM   compound_composition sc
   JOIN   chemical_structure   c  ON sc.chemical_structure_id = c.structure_id 
   WHERE  c.structure_id  @ ('CCNc1ccccc1', '')::bingo.sub
   AND    sc.chem_compound_id = s.chem_compound_id
   )
LIMIT  5 
OFFSET 5

Context

StackExchange Database Administrators Q#46668, answer score: 3

Revisions (0)

No revisions yet.