patternsqlModerate
Very slow execution of a simple query
Viewed 0 times
simplequeryslowveryexecution
Problem
I try to create a report for my data, but it is really slow on a big table.
The table structure is:
The query:
The count is 3940689.
Explain plan
```
GroupAggregate (cost=631282.11..671932.05 rows=338750 width=12) (actual time=22576.318..23826.124 rows=1776 loops=1)
-> Sort (cost=631282.11..639750.85 rows=3387494 width=12) (actual time=22576.188..23438.485 rows=3536790 loops=1)
Sort Key: (date_trunc('hour'::text, ('1970-01-01 00:00:00'::timestamp without time zone + ((social_creation_time)::double precision * '00:00:01'::in
The table structure is:
CREATE TABLE posts
(
id serial NOT NULL,
project_id integer,
moderation character varying(255),
keyword_id integer,
author_id integer,
created_at timestamp without time zone,
updated_at timestamp without time zone,
server_id character varying(255),
social_creation_time integer,
social_id character varying(255),
network character varying(255),
mood character varying(255) DEFAULT NULL::character varying,
url text,
source_id integer,
location character varying(255),
subject_id integer,
conversation_id integer,
CONSTRAINT posts_pkey PRIMARY KEY (id)
);
CREATE INDEX index_posts_on_author_id ON posts (author_id);
CREATE INDEX index_posts_on_keyword_id ON posts (keyword_id);
CREATE INDEX index_posts_on_project_id_and_network_and_social_id
ON posts (project_id, network, social_id);
CREATE INDEX index_posts_on_project_id_and_social_creation_time
ON posts (project_id, social_creation_time DESC);
CREATE INDEX index_posts_on_server_id ON posts (server_id);
CREATE INDEX index_posts_on_social_id ON posts (social_id);The query:
SELECT date_trunc('hour', timestamp 'epoch'
+ (posts.social_creation_time * INTERVAL '1 second')) creating,
network,
count(*) posts
FROM posts
WHERE posts.project_id = 7
AND (posts.moderation NOT IN ('junk','spam'))
AND (posts.social_creation_time BETWEEN 1391716800 AND 1392839999)
GROUP BY network, creating
ORDER BY creatingThe count is 3940689.
Explain plan
```
GroupAggregate (cost=631282.11..671932.05 rows=338750 width=12) (actual time=22576.318..23826.124 rows=1776 loops=1)
-> Sort (cost=631282.11..639750.85 rows=3387494 width=12) (actual time=22576.188..23438.485 rows=3536790 loops=1)
Sort Key: (date_trunc('hour'::text, ('1970-01-01 00:00:00'::timestamp without time zone + ((social_creation_time)::double precision * '00:00:01'::in
Solution
In addition to the good advice by @Craig and @dezso:
Statistics
The count is 3940689.
Yet, your query plan says:
And your count is based on a selection:
4344907 (3940689 + 404218) >> 3387494. Your statistics are not up to date. Something may be wrong with your
If you can afford a lock on the table for some time, run
to clean house. More here.
Index
The numbers indicate your query uses about 90 % of all rows. Therefore, a sequential scan will be faster than any possible index scan - except for a covering index (index-only scan). Requires Postgres 9.2+. Be sure to read the Postgres Wiki on the topic.
Since you only use two small columns out of a long list of columns, such an index would be smaller and faster. While being at it, and depending on your overall requirements, a tailored index like the following would squeeze out the maximum performance: a partial, functional, multicolumn, covering index - at some cost for write operations:
Actual
Generally, covering indexes work well for fairly static tables. Read the Wiki. A quick test:
If
I would also test without the functional aspect to see which is used / faster:
A search for related answers dealing with a partial covering indexe.
Table layout
Finally, in your table definition
Configuring PostgreSQL for read performance
I would recreate your table along these lines:
Will be a bit smaller, helping overall performance.
Why
Statistics
The count is 3940689.
Yet, your query plan says:
Seq Scan on posts (cost=0.00..205984.62 rows=**3387494** width=12)And your count is based on a selection:
Rows Removed by Filter: 4042184344907 (3940689 + 404218) >> 3387494. Your statistics are not up to date. Something may be wrong with your
autovacuum settings, which among other things runs ANALYZE automatically. Very bad for overall db performance. For the query at hand run before you retry anything:ANALYZE postsIf you can afford a lock on the table for some time, run
VACUUM FULL ANALYZE poststo clean house. More here.
Index
The numbers indicate your query uses about 90 % of all rows. Therefore, a sequential scan will be faster than any possible index scan - except for a covering index (index-only scan). Requires Postgres 9.2+. Be sure to read the Postgres Wiki on the topic.
Since you only use two small columns out of a long list of columns, such an index would be smaller and faster. While being at it, and depending on your overall requirements, a tailored index like the following would squeeze out the maximum performance: a partial, functional, multicolumn, covering index - at some cost for write operations:
CREATE INDEX test_idx ON posts (
date_trunc('hour', timestamp 'epoch' + social_creation_time * interval '1 sec')
,network)
WHERE moderation NOT IN ('junk','spam')
AND project_id = 7 -- ??
AND social_creation_time BETWEEN 1391716800 AND 1392839999 -- ??Actual
WHERE conditions depend on your actual requirements and need to be added in a more or less identical form to any query that wishes to use this index. Trim rows that are never used from the index. Only use conditions that eliminate more than a few rows, tailor a super-set of rows needed for your queries.Generally, covering indexes work well for fairly static tables. Read the Wiki. A quick test:
SELECT relallvisible, relpages
FROM pg_class
WHERE oid = 'posts'::regclassIf
relallvisible is not much smaller that relpages, chances are good. Be sure autovacuum is running properly before you try this.I would also test without the functional aspect to see which is used / faster:
CREATE INDEX test_idx ON posts (social_creation_time, network)
WHERE moderation NOT IN ('junk','spam')
AND project_id = 7 -- ??
AND social_creation_time BETWEEN 1391716800 AND 1392839999; -- ??A search for related answers dealing with a partial covering indexe.
Table layout
Finally, in your table definition
integer and text columns interchange, bloating the table quite a bit due to data alignment and padding. More in this related answer:Configuring PostgreSQL for read performance
I would recreate your table along these lines:
CREATE TABLE post (
post_id serial PRIMARY KEY,
project_id integer,
created_at timestamp,
updated_at timestamp,
keyword_id integer,
author_id integer,
source_id integer,
subject_id integer,
conversation_id integer,
social_creation_time integer,
server_id text, -- could be integer?
social_id text,
moderation text,
network text,
url text,
location text,
mood text
);Will be a bit smaller, helping overall performance.
Why
text instead of varchar(255)?Code Snippets
Seq Scan on posts (cost=0.00..205984.62 rows=**3387494** width=12)Rows Removed by Filter: 404218ANALYZE postsVACUUM FULL ANALYZE postsCREATE INDEX test_idx ON posts (
date_trunc('hour', timestamp 'epoch' + social_creation_time * interval '1 sec')
,network)
WHERE moderation NOT IN ('junk','spam')
AND project_id = 7 -- ??
AND social_creation_time BETWEEN 1391716800 AND 1392839999 -- ??Context
StackExchange Database Administrators Q#59170, answer score: 11
Revisions (0)
No revisions yet.