patternsqlMinor
postgresql being slow on count distinct for dates
Viewed 0 times
postgresqldistinctdatesslowbeingforcount
Problem
I have a very simple, but very big, table.
Its schema is like this
Now, yaddate has an index by itself and it is also in other indexes together with other columns (eg. (yadda1, date)).
The table itself is some 100M rows.
When I run
the time needed to get the list is in the range of 200 seconds.
Explain Analyze tells me it's doing a seq scan and I don't understand why, since I the index is there.
First thing I am trying is reindex on the date only column index.
TIA.
Its schema is like this
(yadda int, yadda1 int, yaddate date, ... other stuff).Now, yaddate has an index by itself and it is also in other indexes together with other columns (eg. (yadda1, date)).
The table itself is some 100M rows.
When I run
select distinct date from mybigtable;the time needed to get the list is in the range of 200 seconds.
Explain Analyze tells me it's doing a seq scan and I don't understand why, since I the index is there.
First thing I am trying is reindex on the date only column index.
- Am I doing something wrong?
- Since obviously there's something I am missing about seq and index scan, can someone shed some light?
- How can I make that query faster?
TIA.
Solution
There is a trick with distinct to get it fast using index, that you can try.
It involves creating a function looking like that:
Then create an index on the column you want to count distinct, and
Try it, be beware, I'm not sure it will work right out of the box, as I quickly adapted it from a varchar function.
It involves creating a function looking like that:
CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT '1800-01-01'::date)
RETURNS SETOF anyelement AS
$BODY$
BEGIN
EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
||' LIMIT 1' INTO result;
WHILE result IS NOT NULL LOOP
RETURN NEXT;
EXECUTE 'SELECT '||fieldName||' FROM '||tableName
||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1'
INTO result USING result;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;Then create an index on the column you want to count distinct, and
select small_distinct('yourtable', 'yaddate'); should return you the distinct values you want, without the need to read the table.Try it, be beware, I'm not sure it will work right out of the box, as I quickly adapted it from a varchar function.
Code Snippets
CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT '1800-01-01'::date)
RETURNS SETOF anyelement AS
$BODY$
BEGIN
EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
||' LIMIT 1' INTO result;
WHILE result IS NOT NULL LOOP
RETURN NEXT;
EXECUTE 'SELECT '||fieldName||' FROM '||tableName
||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1'
INTO result USING result;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;Context
StackExchange Database Administrators Q#63268, answer score: 2
Revisions (0)
No revisions yet.