patternsqlMinor
Is it a good idea to index timestamp in postgresql?
Viewed 0 times
postgresqlideagoodtimestampindex
Problem
Here is my table definition
The cardinality of the table is about 10-20 million rows
Here is the query I want to speed up
I am tempted to just index the timestamp, but I worry that since the precision of timestamp is millisecond, this means the the cardinality of the index itself will be almost as large as the cardinality of the table (eg. the number of rows will be equal to the number of index entry)
Should I have custom index that index with the precision of a day?
Like store an extra column
And index the text instead?
-- Table: public.my_table
-- DROP TABLE public.my_table;
CREATE TABLE public.my_table
(
my_row_id text COLLATE pg_catalog."default" NOT NULL,
my_item_id text COLLATE pg_catalog."default" NOT NULL,
my_timestamp timestamp with time zone NOT NULL DEFAULT now(),
-- ...... some other data
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.my_table
OWNER to my_db;The cardinality of the table is about 10-20 million rows
Here is the query I want to speed up
select my_row_id, my_item_id
from my_table
where my_timestamp > timestamp '2018-09-13 00:00:00'
order by my_timestamp descI am tempted to just index the timestamp, but I worry that since the precision of timestamp is millisecond, this means the the cardinality of the index itself will be almost as large as the cardinality of the table (eg. the number of rows will be equal to the number of index entry)
Should I have custom index that index with the precision of a day?
Like store an extra column
text 2018-08-17 from timestamp with timezone 2018-08-17 07:06:43.508778And index the text instead?
Solution
For the query you give, the optimal general index would be on
Like store an extra column text 2018-08-17 from timestamp with timezone 2018-08-17 07:06:43.508778
There is little reason to think that this would be helpful in general, and certainly wouldn't help with the query you show (you would have to rewrite the query before it could use such an index, and then what happens if the timestamp is exactly '2018-09-13 00:00:00' ?)
(my_timestamp, my_row_id, my_item_id), with the order of the last two columns possibly switched. This will allow an index-only scan to be used which possibly doesn't visit the table at all, since all columns you reference would be available in the index. How much of a performance improvement this would give you depends on how much of your table meets the timestamp WHERE condition, and how well-vacuumed you keep the table. You might do even better with a partial index if the timestamp hardcoded into your query never changes, or changes in a very rote manner.Like store an extra column text 2018-08-17 from timestamp with timezone 2018-08-17 07:06:43.508778
There is little reason to think that this would be helpful in general, and certainly wouldn't help with the query you show (you would have to rewrite the query before it could use such an index, and then what happens if the timestamp is exactly '2018-09-13 00:00:00' ?)
Context
StackExchange Database Administrators Q#215308, answer score: 2
Revisions (0)
No revisions yet.