patternsqlMinor
Slow query performance due to temporary file?
Viewed 0 times
duefiletemporaryqueryslowperformance
Problem
Here is the query:
I have an index on
Each time I run the query from my application, I get the following in the logs:
I believe this temporary file creation is the cause of the slow performance.
Running an
I have then used http://explain.depesz.com/ to make it a bit more readable:
Per node type stats
`+-------------------+-------+--------------+------------+
| node type | count | sum of times | % of query |
+-------------------+-------+--------------+------------+
| Bitmap Heap Scan | 1 | 379.873 ms | 67.5 % |
| Bitmap Index Scan | 1 | 18.006 ms | 3.2 % |
| Sort | 1 | 164.756 ms | 29.3 % |
+-------------------+-------+----
SELECT "products".*
FROM "products"
WHERE (status > 100)
AND "products"."above_revenue_average" = 't'
AND ("products"."category_id" NOT IN (5))
ORDER BY "products"."start_date" DESCI have an index on
status and start_date.Each time I run the query from my application, I get the following in the logs:
[WHITE] temporary file:
path "pg_tblspc/16386/PG_9.3_201306121/pgsql_tmp/pgsql_tmp2544.0", size 37093376
Query: SELECT "products".* FROM "products" WHERE (status > 100)
AND "products"."above_revenue_average" = 't'
AND ("products"."category_id" NOT IN (5)) ORDER BY "products"."start_date" DESC
I believe this temporary file creation is the cause of the slow performance.
Running an
EXPLAIN ANALYZE I get the following results:QUERY PLAN
Sort (cost=63395.28..63403.51 rows=16460 width=524)
(actual time=524.134..562.635 rows=65294 loops=1)
Sort Key: start_date
Sort Method: external merge Disk: 36224kB
-> Bitmap Heap Scan on products
(cost=4803.40..60389.73 rows=16460 width=524)
(actual time=27.390..397.879 rows=65294 loops=1)
Recheck Cond: (status > 100)
Filter: (above_revenue_average AND (category_id <> 5))
Rows Removed by Filter: 25115
-> Bitmap Index Scan on index_products_on_status
(cost=0.00..4802.58 rows=89662 width=0)
(actual time=18.006..18.006 rows=90411 loops=1)
Index Cond: (status > 100)
Total runtime: 577.870 ms
(10 rows)
I have then used http://explain.depesz.com/ to make it a bit more readable:
Per node type stats
`+-------------------+-------+--------------+------------+
| node type | count | sum of times | % of query |
+-------------------+-------+--------------+------------+
| Bitmap Heap Scan | 1 | 379.873 ms | 67.5 % |
| Bitmap Index Scan | 1 | 18.006 ms | 3.2 % |
| Sort | 1 | 164.756 ms | 29.3 % |
+-------------------+-------+----
Solution
work_memObviously, the sort operation spills to disk:
Sort Method: external merge Disk: 36224kBMore
work_mem can help the query, like @Kassandry already suggested. Increase the setting until you see Memory instead of Disk in the EXPLAIN output. But it's probably a bad idea to increase the general setting based on one query. Proper setting depends on available RAM and your complete situation. Start by reading in the Postgres Wiki.To just fix your query, set
work_mem high enough for your transaction only with SET LOCAL in the same transaction.BEGIN;
SET LOCAL work_mem = '45MB';
SELECT ...
COMMIT; -- or ROLLBACKYou probably need a bit more than 40 MB. In-memory representation is a bit bigger than on-disk representation. Related:
- Configuration parameter work_mem in PostgreSQL on Linux
Query
Your query (after trimming some noise):
SELECT *
FROM products
WHERE status > 100
AND above_revenue_average -- boolean can be used directly
AND category_id <> 5
ORDER BY start_date DESC;Your rows are half a kilobyte wide (
width=524). Do you need to return all columns? (Typically, you don't.) Only list columns in the SELECT list which you need from your query to improve overall performance, especially since you have work_mem issues already.Can any of the involved columns be NULL? Particularly important for
category_id and start_date. You might want to adapt in that case ...Index
A multicolumn index can certainly help performance. (Like @Paul outlined). You have to weigh cost and gain. If performance for this query is important or it is very common, go for it. Don't create a special index for every query. As few as possible, as many as necessary. Indexes are more powerful when shared, that increases chances that more of them stay in cache.
A
boolean column like above_revenue_average is a typical candidate for a condition in a partial index rather than for an index column.- Unexpected Seq Scan when doing query against boolean with value NULL
My wild guess based on incomplete information:
CREATE INDEX prod_special_idx ON products (start_date DESC)
WHERE above_revenue_average
AND status > 100
AND category_id <> 5;Use
DESC NULLS LAST in index and query if start_date can be NULL.- PostgreSQL sort by datetime asc, null first?
Code Snippets
Sort Method: external merge Disk: 36224kBBEGIN;
SET LOCAL work_mem = '45MB';
SELECT ...
COMMIT; -- or ROLLBACKSELECT *
FROM products
WHERE status > 100
AND above_revenue_average -- boolean can be used directly
AND category_id <> 5
ORDER BY start_date DESC;CREATE INDEX prod_special_idx ON products (start_date DESC)
WHERE above_revenue_average
AND status > 100
AND category_id <> 5;Context
StackExchange Database Administrators Q#112079, answer score: 6
Revisions (0)
No revisions yet.