patternsqlMinor
Poor SELECT performance on recheck cond with data and large (millions of rows) partial index
Viewed 0 times
rowsindexcondmillionswithpoorrechecklargeperformancepartial
Problem
Query:
Index:
Plan:
Schema:
Data:
EXPLAIN (ANALYZE, BUFFERS) SELECT
COUNT (*) AS "count"
FROM
"Posts" AS "Post"
WHERE
"Post"."createdAt" > '2015-08-19 14:55:50.398'
AND "Post"."new" = TRUE;Index:
CREATE INDEX posts_new_createdat_idx ON "Posts" ("createdAt")
WHERE
NEW = TRUEPlan:
Aggregate (cost=389915.59..389915.60 rows=1 width=0) (actual time=4234.772..4234.773 rows=1 loops=1)
Buffers: shared hit=254427
-> Bitmap Heap Scan on "Posts" "Post" (cost=14415.81..387990.63 rows=769985 width=0) (actual time=123.805..3859.150 rows=1138854 loops=1)
Recheck Cond: (("createdAt" > '2015-08-19 14:55:50.398+00'::timestamp with time zone) AND new)
Rows Removed by Index Recheck: 8238790
Buffers: shared hit=254427
-> Bitmap Index Scan on posts_new_createdat_idx (cost=0.00..14223.32 rows=769985 width=0) (actual time=122.601..122.601 rows=1138854 loops=1)
Index Cond: ("createdAt" > '2015-08-19 14:55:50.398+00'::timestamp with time zone)
Buffers: shared hit=3114
Total runtime: 4234.989 msSchema:
CREATE TABLE "public"."Posts" (
"id" int4 NOT NULL DEFAULT nextval('"Posts_id_seq"'::regclass),
"actionId" int4,
"commentCount" int4 DEFAULT 0,
"facebook" bool,
"featurePostOnDate" timestamp(6) WITH TIME ZONE,
"forcedPrivate" bool,
"instagram" bool,
"isReported" bool,
"likeCount" int4 DEFAULT 0,
"note" text COLLATE "default",
"photo" varchar(255) COLLATE "default",
"private" bool,
"new" bool,
"popular" bool,
"twitter" bool,
"userId" int4,
"objectId" varchar(255) COLLATE "default",
"createdAt" timestamp(6) WITH TIME ZONE,
"updatedAt" timestamp(6) WITH TIME ZONE,
"activityLogId" int4,
"weightLogId" int4,
"workoutId" int4,
"workoutLogId" int4,
"thumbnail" varchar(255) COLLATE "default"
)Data:
new = truefor 99% or records
- Any posts older than 2 weeks can be ignored (from the count and the index)
Solution
Add the second predicate of your query to the partial index as well:
Your timestamp is probably a moving target, but I am going to assume you have lots of old rows that are excluded in most of your queries and only few "younger" rows are of interest. A typical use case. You can cut off old rows in your partial index.
This only makes sense if you can reduce the size of the index to a fraction. Else, the effect is typically not worth the trouble.
Replace with an opportune timestamp to cut off as many rows as possible while still covering relevant rows for your queries.
Modern versions of Postgres are smart enough to understand the index is applicable to queries cutting off at a later timestamp. For older versions you have to add the verbatim index condition redundantly to make the query planner use this partial index.
This index will be used by your query immediately as is. (I just removed some more noise):
Related:
-
Add datetime constraint to a PostgreSQL multi-column partial index
-
"Recheck Cond:" line in query plans with a bitmap index scan
Aside 1:
Aside 2: Don't use reserved words or CaMeL case or other illegal identifiers so you don't have to double-quote all the time. Very error prone.
WHERE "Post"."createdAt" > '2015-08-19 14:55:50.398'Your timestamp is probably a moving target, but I am going to assume you have lots of old rows that are excluded in most of your queries and only few "younger" rows are of interest. A typical use case. You can cut off old rows in your partial index.
CREATE INDEX posts_new_createdat_idx ON "Posts"("createdAt")
WHERE "new"
AND "createdAt" > '2015-08-01 00:00'; -- replace with useful timestampThis only makes sense if you can reduce the size of the index to a fraction. Else, the effect is typically not worth the trouble.
Replace with an opportune timestamp to cut off as many rows as possible while still covering relevant rows for your queries.
Modern versions of Postgres are smart enough to understand the index is applicable to queries cutting off at a later timestamp. For older versions you have to add the verbatim index condition redundantly to make the query planner use this partial index.
This index will be used by your query immediately as is. (I just removed some more noise):
SELECT COUNT(*) AS "count"
FROM "Posts"
WHERE "createdAt" > '2015-08-19 14:55:50.398'
AND "new";Related:
-
Add datetime constraint to a PostgreSQL multi-column partial index
-
"Recheck Cond:" line in query plans with a bitmap index scan
Aside 1:
"new" = TRUE is just a noisy way of stating "new" for a boolean value.Aside 2: Don't use reserved words or CaMeL case or other illegal identifiers so you don't have to double-quote all the time. Very error prone.
Code Snippets
WHERE "Post"."createdAt" > '2015-08-19 14:55:50.398'CREATE INDEX posts_new_createdat_idx ON "Posts"("createdAt")
WHERE "new"
AND "createdAt" > '2015-08-01 00:00'; -- replace with useful timestampSELECT COUNT(*) AS "count"
FROM "Posts"
WHERE "createdAt" > '2015-08-19 14:55:50.398'
AND "new";Context
StackExchange Database Administrators Q#138276, answer score: 3
Revisions (0)
No revisions yet.