patternsqlMinor
Group By primary key or DISTINCT increase query time over 1000x with limit
Viewed 0 times
distinctprimarygroupwithlimitquerytime1000xincreaseover
Problem
Also see https://stackoverflow.com/questions/17741167/hibernate-jpa-improve-performance-of-distinct-query but I realized this is mainly a PostgreSQL issue.
My application uses a 3rd party extension to PostgreSQL to for searching chemical structures. This is in general slow. I can not change the SQL directly as the application uses hibernate and native query is not an option.
I have a many-to-many relationship and the "Link-table" has an additional column. Basically I have a Mixture that consists of elements and an element occurs in the mixture at a certain percentage. favorite
I use Spring-Data JPA with QueryDSL, hibernate and PostgreSQL. I have a query with 2 Joins It's a many too many with a link-table that has additional columns. Bascially I have a Mixture that consists of elements and an element occurs in the mixture at a certain percentage.
I'm now searching all Mixtures that contain an element matching the given criteria. Because a mixture can have multiple elements that match the criteria, the query may return the same entity multiple times. I want to prevent that hence DISTINCT or GROUP BY primary key.
The query is also paged meaning it uses limit and offset. The query runs perfectly fine without either distinct or group by but then I can get duplicate rows. If I add either group by or distinct query is over 1000 times slower.
Query with DISTINCT (note SQL from hibernate):
EXPLAIN ANALYZE with DISTINCT:
```
"Limit (cost=5984.58..5984
My application uses a 3rd party extension to PostgreSQL to for searching chemical structures. This is in general slow. I can not change the SQL directly as the application uses hibernate and native query is not an option.
I have a many-to-many relationship and the "Link-table" has an additional column. Basically I have a Mixture that consists of elements and an element occurs in the mixture at a certain percentage. favorite
I use Spring-Data JPA with QueryDSL, hibernate and PostgreSQL. I have a query with 2 Joins It's a many too many with a link-table that has additional columns. Bascially I have a Mixture that consists of elements and an element occurs in the mixture at a certain percentage.
I'm now searching all Mixtures that contain an element matching the given criteria. Because a mixture can have multiple elements that match the criteria, the query may return the same entity multiple times. I want to prevent that hence DISTINCT or GROUP BY primary key.
The query is also paged meaning it uses limit and offset. The query runs perfectly fine without either distinct or group by but then I can get duplicate rows. If I add either group by or distinct query is over 1000 times slower.
Query with DISTINCT (note SQL from hibernate):
select distinct
simplecomp0_.chem_compound_id as chem1_0_,
--snipped about 10 more columns all short varchar or date fields
from simple_compound simplecomp0_
inner join compound_composition compositio1_
on simplecomp0_.chem_compound_id=compositio1_.chem_compound_id
inner join chemical_structure chemicalst2_
on compositio1_.chemical_structure_id=chemicalst2_.structure_id
where
chemicalst2_.structure_id @ ('CCNc1ccccc1', '')::bingo.sub
limit 5
offset 5EXPLAIN ANALYZE with DISTINCT:
```
"Limit (cost=5984.58..5984
Solution
First off,
You may be able to solve your conundrum with the good old
It's hard to be more specific without knowing the table structure, cardinalities, index definitions and what's behind your peculiar
This also assumes you are only interested in columns from
LIMIT / OFFSET without ORDER BY are of limited usefulness, since the order is arbitrary and can change any time (when VACUUM runs or when the table is manipulated in at any way). It is only somewhat reliable with read-only tables. That's fine if you don't care which rows you get back, but it may break paging.You may be able to solve your conundrum with the good old
EXISTS. PostgreSQL can stop searching for more hits as soon as the first is found - as opposed to your query with DISTINCT, where it tries to collect all matches.It's hard to be more specific without knowing the table structure, cardinalities, index definitions and what's behind your peculiar
WHERE expression. But this might just do it:SELECT s.chem_compound_id AS chem1_0_
-- 10 more columns all short varchar or date fields
FROM simple_compound s
WHERE EXISTS (
SELECT 1
FROM compound_composition sc
JOIN chemical_structure c ON sc.chemical_structure_id = c.structure_id
WHERE c.structure_id @ ('CCNc1ccccc1', '')::bingo.sub
AND sc.chem_compound_id = s.chem_compound_id
)
LIMIT 5
OFFSET 5This also assumes you are only interested in columns from
simple_compound in the output.Code Snippets
SELECT s.chem_compound_id AS chem1_0_
-- 10 more columns all short varchar or date fields
FROM simple_compound s
WHERE EXISTS (
SELECT 1
FROM compound_composition sc
JOIN chemical_structure c ON sc.chemical_structure_id = c.structure_id
WHERE c.structure_id @ ('CCNc1ccccc1', '')::bingo.sub
AND sc.chem_compound_id = s.chem_compound_id
)
LIMIT 5
OFFSET 5Context
StackExchange Database Administrators Q#46668, answer score: 3
Revisions (0)
No revisions yet.