HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Slow query performance due to temporary file?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
duefiletemporaryqueryslowperformance

Problem

Here is the query:

SELECT "products".* 
FROM "products"
WHERE (status > 100) 
AND "products"."above_revenue_average" = 't' 
AND ("products"."category_id" NOT IN (5))
ORDER BY "products"."start_date" DESC


I have an index on status and start_date.

Each time I run the query from my application, I get the following in the logs:

[WHITE] temporary file:
path "pg_tblspc/16386/PG_9.3_201306121/pgsql_tmp/pgsql_tmp2544.0", size 37093376
Query: SELECT "products".* FROM "products" WHERE (status > 100)
AND "products"."above_revenue_average" = 't'
AND ("products"."category_id" NOT IN (5)) ORDER BY "products"."start_date" DESC


I believe this temporary file creation is the cause of the slow performance.

Running an EXPLAIN ANALYZE I get the following results:

QUERY PLAN
Sort (cost=63395.28..63403.51 rows=16460 width=524)
(actual time=524.134..562.635 rows=65294 loops=1)
Sort Key: start_date
Sort Method: external merge Disk: 36224kB
-> Bitmap Heap Scan on products
(cost=4803.40..60389.73 rows=16460 width=524)
(actual time=27.390..397.879 rows=65294 loops=1)
Recheck Cond: (status > 100)
Filter: (above_revenue_average AND (category_id <> 5))
Rows Removed by Filter: 25115
-> Bitmap Index Scan on index_products_on_status
(cost=0.00..4802.58 rows=89662 width=0)
(actual time=18.006..18.006 rows=90411 loops=1)
Index Cond: (status > 100)
Total runtime: 577.870 ms
(10 rows)


I have then used http://explain.depesz.com/ to make it a bit more readable:

Per node type stats

`+-------------------+-------+--------------+------------+
| node type | count | sum of times | % of query |
+-------------------+-------+--------------+------------+
| Bitmap Heap Scan | 1 | 379.873 ms | 67.5 % |
| Bitmap Index Scan | 1 | 18.006 ms | 3.2 % |
| Sort | 1 | 164.756 ms | 29.3 % |
+-------------------+-------+----

Solution

work_mem

Obviously, the sort operation spills to disk:

Sort Method: external merge  Disk: 36224kB


More work_mem can help the query, like @Kassandry already suggested. Increase the setting until you see Memory instead of Disk in the EXPLAIN output. But it's probably a bad idea to increase the general setting based on one query. Proper setting depends on available RAM and your complete situation. Start by reading in the Postgres Wiki.

To just fix your query, set work_mem high enough for your transaction only with SET LOCAL in the same transaction.

BEGIN;

SET LOCAL work_mem = '45MB';

SELECT ...

COMMIT;   -- or ROLLBACK


You probably need a bit more than 40 MB. In-memory representation is a bit bigger than on-disk representation. Related:

  • Configuration parameter work_mem in PostgreSQL on Linux



Query

Your query (after trimming some noise):

SELECT * 
FROM   products
WHERE  status > 100
AND    above_revenue_average  -- boolean can be used directly
AND    category_id <> 5
ORDER  BY start_date DESC;


Your rows are half a kilobyte wide (width=524). Do you need to return all columns? (Typically, you don't.) Only list columns in the SELECT list which you need from your query to improve overall performance, especially since you have work_mem issues already.

Can any of the involved columns be NULL? Particularly important for category_id and start_date. You might want to adapt in that case ...
Index

A multicolumn index can certainly help performance. (Like @Paul outlined). You have to weigh cost and gain. If performance for this query is important or it is very common, go for it. Don't create a special index for every query. As few as possible, as many as necessary. Indexes are more powerful when shared, that increases chances that more of them stay in cache.

A boolean column like above_revenue_average is a typical candidate for a condition in a partial index rather than for an index column.

  • Unexpected Seq Scan when doing query against boolean with value NULL



My wild guess based on incomplete information:

CREATE INDEX prod_special_idx ON products (start_date DESC)
WHERE  above_revenue_average
AND    status > 100
AND    category_id <> 5;


Use DESC NULLS LAST in index and query if start_date can be NULL.

  • PostgreSQL sort by datetime asc, null first?

Code Snippets

Sort Method: external merge  Disk: 36224kB
BEGIN;

SET LOCAL work_mem = '45MB';

SELECT ...

COMMIT;   -- or ROLLBACK
SELECT * 
FROM   products
WHERE  status > 100
AND    above_revenue_average  -- boolean can be used directly
AND    category_id <> 5
ORDER  BY start_date DESC;
CREATE INDEX prod_special_idx ON products (start_date DESC)
WHERE  above_revenue_average
AND    status > 100
AND    category_id <> 5;

Context

StackExchange Database Administrators Q#112079, answer score: 6

Revisions (0)

No revisions yet.