HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Most efficient way to query a date range in Postgresql

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlrangequeryefficientwaydatemost

Problem

I have a table with a timestamp with tz column. I want to perform a count / group by query for rows within a certain date range:

select count(1), url 
from app_pageview 
where viewed_at between '2019-01-01' and '2020-01-01'
group by 2
order by 1 desc 
limit 10;


viewed_at has a btree index applied, but when I view explain... it doesn't appear to be using the index:

postgres=# explain select count(1), url 
from app_pageview 
where viewed_at between '2019-01-01' and '2019-01-31'
group by 2
order by 1 desc 
limit 10;
                                                                                       QUERY PLAN                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2173025.08..2173025.10 rows=10 width=32)
   ->  Sort  (cost=2173025.08..2173025.33 rows=101 width=32)
         Sort Key: (count(1)) DESC
         ->  Finalize GroupAggregate  (cost=2172997.31..2173022.89 rows=101 width=32)
               Group Key: url
               ->  Gather Merge  (cost=2172997.31..2173020.87 rows=202 width=32)
                     Workers Planned: 2
                     ->  Sort  (cost=2171997.28..2171997.53 rows=101 width=32)
                           Sort Key: url
                           ->  Partial HashAggregate  (cost=2171992.91..2171993.92 rows=101 width=32)
                                 Group Key: url
                                 ->  Parallel Seq Scan on app_pageview  (cost=0.00..2163462.00 rows=1706182 width=24)
                                       Filter: ((viewed_at >= '2019-01-01 00:00:00+00'::timestamp with time zone) AND (viewed_at <= '2019-01-31 00:00:00+00'::timestamp with time zone))
 JIT:
   Functions: 13
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(16 rows)


I have gener

Solution

What is order by 1? i think combining that with limit 10 is resulting in postgress not knowing which 10, so it first gets all of them and then returns 10. However this theoretically shouldn't slow down the query.

But, do try

explain select count(*)
from app_pageview
where viewed_at between '2019-01-01' and '2019-01-31';


and see if that simplifies the situation.

for the query with siteid, you may benefit from a multicolumn index with siteid, viewed_at, although postgres is able to combine indexes somtimes. get the single-column query working well first and then experiment with this.

Context

StackExchange Database Administrators Q#257065, answer score: 2

Revisions (0)

No revisions yet.