patternsqlMinor
Postgres query plan has high rowcount, slow execution
Viewed 0 times
postgresqueryhighrowcountslowplanhasexecution
Problem
I've identified this snippet of a query as running at incredibly slow speeds, although it's quite simple. It aims to find all the records in Table1 which don't have a more recent entry in Table2 with the same SessionID.
Where Table1 is a temporary table with 2640 rows and Table2 is permanent with 3756955 rows.
Explain analyze gives the following output:
I'm guessing that the slow execution time is down to the high row count, estimating nearly 3bn in the loop with actual count close to 3m, and 1.1m in the index scan and finding 1102, but I can't find the reason for this. This is just a small dataset for testing, when I used it on the full dataset (340,000 rows in Table1), it filled 45GB of temp files before I cancelled it.
What I've tried so far:
SELECT t1."VisitorID", t1."Timestamp"
FROM "Table1" t1
JOIN Table2 s ON t2."Timestamp" > t1."Timestamp"
AND t2."SessionID" = t1."SessionID"
GROUP BY sgs."VisitorID", sgs."Timestamp"
HAVING COUNT(*) = 0Where Table1 is a temporary table with 2640 rows and Table2 is permanent with 3756955 rows.
Explain analyze gives the following output:
HashAggregate (cost=119823801.74..119823830.53 rows=2303 width=24) (actual time=5172.891..5172.891 rows=0 loops=1)
Group Key: t1."VisitorID", t1."Timestamp"
Filter: (count(*) = 0)
Rows Removed by Filter: 2228
-> Nested Loop (cost=0.43..97379087.60 rows=2992628552 width=24) (actual time=0.036..3986.857 rows=2909911 loops=1)
-> Seq Scan on "Table1" t1 (cost=0.00..80.40 rows=2640 width=61) (actual time=0.007..1.588 rows=2640 loops=1)
-> Index Scan using "IX_Table2_Timestamp" on "Table2" t2 (cost=0.43..25191.47 rows=1169452 width=45) (actual time=0.036..1.120 rows=1102 loops=2640)
Index Cond: ("Timestamp" >= t1."Timestamp")
Filter: ((t1."SessionID")::text = ("SessionID")::text)
Rows Removed by Filter: 71
"Planning time: 0.243 ms"
"Execution time: 5172.947 ms"I'm guessing that the slow execution time is down to the high row count, estimating nearly 3bn in the loop with actual count close to 3m, and 1.1m in the index scan and finding 1102, but I can't find the reason for this. This is just a small dataset for testing, when I used it on the full dataset (340,000 rows in Table1), it filled 45GB of temp files before I cancelled it.
What I've tried so far:
- VACUUM ANALYZE on all tables involved before that part of the query runs.
- Indexing all columns involved.
- Replacing the greater-than with equals on Timestamp
- Removing
Solution
(Unsure whether this belongs as an answer, comment or edit so I'll leave it here unless I hear otherwise.)
I was unable to find the cause of the bad row count. However, after rewriting the query like this,
the estimated row count went to a normal level and execution time decreased drastically, even on large datasets. I tested this with and without composite indexes like so:
and found no significant performance difference.
I was unable to find the cause of the bad row count. However, after rewriting the query like this,
SELECT t1."VisitorID", t1."Timestamp"
FROM Table1 t1
WHERE NOT EXISTS (
SELECT t2.VisitorID, t2.Timestamp
FROM Table2 t2
WHERE t2.SessionID = t1.SessionID
AND t2.Timestamp > t1.Timestamp
)the estimated row count went to a normal level and execution time decreased drastically, even on large datasets. I tested this with and without composite indexes like so:
CREATE INDEX "IX_Temp_Table1_Visitor-Session-Time"
ON "Table1"
USING BTREE ("VisitorID","SessionID","Timestamp");
CREATE INDEX "IX_Table2_Visitor-Session-Time"
ON "Table2"
USING BTREE ("VisitorID","SessionID","Timestamp");and found no significant performance difference.
Code Snippets
SELECT t1."VisitorID", t1."Timestamp"
FROM Table1 t1
WHERE NOT EXISTS (
SELECT t2.VisitorID, t2.Timestamp
FROM Table2 t2
WHERE t2.SessionID = t1.SessionID
AND t2.Timestamp > t1.Timestamp
)CREATE INDEX "IX_Temp_Table1_Visitor-Session-Time"
ON "Table1"
USING BTREE ("VisitorID","SessionID","Timestamp");
CREATE INDEX "IX_Table2_Visitor-Session-Time"
ON "Table2"
USING BTREE ("VisitorID","SessionID","Timestamp");Context
StackExchange Database Administrators Q#108842, answer score: 2
Revisions (0)
No revisions yet.