HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Efficient PostgreSQL Updates Using a Temporary Table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqltemporarytableefficientusingupdates

Problem

I've written a small program to import product-detail updates from file which takes much longer than expected. (I'll use trimmed-down examples for the sake of brevity.)

The program does the following:

  • Reads in data from file.



  • Performs certain modifications and creates an in-memory file.



  • Creates a temporary table to hold the processed file data.



  • COPYs the modified data into the temporary table.



  • Updates the actual table from the temporary table.



This all works fine, except the UPDATE query takes ~20 seconds for a small file of ~2000 rows.

The temporary table looks like this:

CREATE TEMPORARY TABLE tmp_products (
  product_id integer,
  detail text
);


And my update query is really straightforward:

UPDATE products
SET detail = t.detail
FROM tmp_products t
WHERE t.product_id = products.product_id


To speed things up, I tried the following with very little success:

Create a BTREE index on the temporary table.

CREATE INDEX tmp_products_idx
  ON tmp_products
  USING BTREE
  (product_id);


Creating a HASH index:

CREATE INDEX tmp_products_idx
  ON tmp_products
  USING HASH
  (product_id);


Neither index improved the update time significantly. Then I thought perhaps clustering the table would help, but that meant I couldn't use the HASH index. So I modified the queries in the program to use a BTREE index and then CLUSTER/ANALYZE:

CREATE INDEX tmp_products_idx
  ON tmp_products
  USING BTREE
  (product_id);

-- Program inserts data

CLUSTER tmp_products USING tmp_products_idx;
ANALYZE tmp_products;


This didn't help anything either. I took one more stab at it by using both a BTREE and HASH index hoping that the CLUSTER would use the BTREE and the UPDATE would use the HASH:

```
CREATE INDEX tmp_products_btree_idx
ON tmp_products
USING BTREE
(product_id);

CREATE INDEX tmp_products_hash_idx
ON tmp_products
USING BTREE
(product_id);

-- Program inserts data

CLUSTER tmp_products USING tmp_products_bt

Solution

Things that can be done, although whether they help or not... is another story:

-
Given that your UPDATE is quite simple, my first guess is your triggers are hurting your performance. Processing triggers is normally time-consuming, specially if they're written in any interpreted language (which means, more or less, they're not written in C). If you have possibility of checking with a development machine, test disabling all the triggers, and see what effect it has (time your queries!). Then reenable them one by one, and see which effect each one has on timings. You can find a few triggers hurting (a lot). If there are a few that consume a lot of time, have someone qualified revise them, optimize them, and even if necessary, rewrite them using C. My experience is that any kind of logging or auditing your inserts might make the process slower by (easily) a factor of 10. Take into account that, on top of your 14 triggers, the database might have added some more to make sure all constraints are met (CHECK, REFERENCES, UNIQUE, ...). It's not normally a good idea to try to disable those (and doing it is not straightforward, if possible at all).

-
Try to find out whether you really need all the indexes in your setup. Check the explanations about Unused Indexes on PostgreSQL wiki. The way your query is working (only updating the column detail), if detail is not part of any index, this wouldn't have much of an influence. PostgreSQL should be able to perform a Heap Only Tuple (HOT) update, and the indexes wouldn't have any big effect.

-
For HOT updates to succeed, you need some free space in your tables. So, make sure your tables have a fillfactor less than 100. From the docs on CREATE TABLE:


fillfactor (integer)


The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default. When a smaller fillfactor is specified, INSERT operations pack table pages only to the indicated percentage; the remaining space on each page is reserved for updating rows on that page. This gives UPDATE a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. For a table whose entries are never updated, complete packing is the best choice, but in heavily updated tables smaller fillfactors are appropriate. This parameter cannot be set for TOAST tables.

(emphasis mine)

-
Consider having a covering index on your temporary table. That is:

CREATE INDEX tmp_products_idx
ON tmp_products
USING BTREE
(product_id, detail);

ANALYZE tmp_products;


This makes sense only if the length of detail is moderate. I don't think this will make much of a difference... because this might allow for an Index Only Scan for the source part of your update, but you won't be certain unless you try it.

Some more information about your execution plans will be necessary to provide finer advice.

Code Snippets

CREATE INDEX tmp_products_idx
ON tmp_products
USING BTREE
(product_id, detail);

ANALYZE tmp_products;

Context

StackExchange Database Administrators Q#177898, answer score: 5

Revisions (0)

No revisions yet.