patternsqlMajor
Why did Postgres UPDATE take 39 hours?
Viewed 0 times
whyupdatepostgreshourstakedid
Problem
I have a Postgres table with ~2.1 million rows. I ran the below update on it:
This query took 39 hours to run. I am running this on a 4 (physical) core i7 Q720 laptop processor, plenty of RAM, nothing else running the vast majority of the time. No HDD space constraints. The table had recently been vacuumed, analyzed, and reindexed.
The whole time the query was running, at least after the initial
The laptop's power setting were on High performance (Windows 7 x64).
Here's the EXPLAIN:
WITH stops AS (
SELECT id,
rank() OVER (ORDER BY offense_timestamp,
defendant_dl,
offense_street_number,
offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction=1
)
UPDATE consistent.master
SET arrest_id=stops.stop
FROM stops
WHERE master.id = stops.id;This query took 39 hours to run. I am running this on a 4 (physical) core i7 Q720 laptop processor, plenty of RAM, nothing else running the vast majority of the time. No HDD space constraints. The table had recently been vacuumed, analyzed, and reindexed.
The whole time the query was running, at least after the initial
WITH completed, CPU usage was usually low, and the HDD was in use 100%. The HDD was being used so hard that any other app ran considerably more slowly than normal.The laptop's power setting were on High performance (Windows 7 x64).
Here's the EXPLAIN:
Update on master (cost=822243.22..1021456.89 rows=2060910 width=312)
CTE stops
-> WindowAgg (cost=529826.95..581349.70 rows=2060910 width=33)
-> Sort (cost=529826.95..534979.23 rows=2060910 width=33)
Sort Key: consistent.master.offense_timestamp, consistent.master.defendant_dl, consistent.master.offense_street_number, consistent.master.offense_street_name
-> Seq Scan on master (cost=0.00..144630.06 rows=2060910 width=33)
Filter: (citing_jurisdiction = 1)
-> Hash Join (cost=240893.51..440107.19 rows=2060910 width=312)
Hash Cond: (stops.id = consistent.master.id)
-> CTE Scan on stops (cost=0.00..41218.20 rows=2060910 width=48)
-> Hash (cost=139413.45..139413.45 rows=2086645 width=268)
-> Seq Scan on master (cost=0.00..139413.45 rows=2086645 width=268)citing_jurisdiction=1 only excludes a few tens of thousands of rows. ESolution
I had something similar happen recently with a table of 3.5 million rows. My update would never finish. After a lot of experimenting and frustration, I finally found the culprit. It turned out to be the indexes on the table being updated.
The solution was to drop all indexes on the table being updated before running the update statement. Once I did that, the update finished in a few minutes. Once the update completed, I re-created the indexes and was back in business. This probably won't help you at this point but it may someone else looking for answers.
I'd keep the indexes on the table you are pulling the data from. That one won't have to keep updating any indexes and should help with finding the data you want to update. It ran fine on a slow laptop.
The solution was to drop all indexes on the table being updated before running the update statement. Once I did that, the update finished in a few minutes. Once the update completed, I re-created the indexes and was back in business. This probably won't help you at this point but it may someone else looking for answers.
I'd keep the indexes on the table you are pulling the data from. That one won't have to keep updating any indexes and should help with finding the data you want to update. It ran fine on a slow laptop.
Context
StackExchange Database Administrators Q#15720, answer score: 33
Revisions (0)
No revisions yet.