HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Joining two large tables on a date range with Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablespostgreswithrangedatetwolargejoining

Problem

I know this question must come up often, but I can't seem to find a recent answer that addresses something more simple like what I'm trying to do. I've read several similar questions.

I'm using PostgreSQL 9.4 and have 2 tables, each with close to 200M rows containing two types of metrics. The columns are the_date, feature_A, feature_B and 24 hourly metrics. The metrics in the two tables are different enough such that the tables can't be combined. Or at least I don't think so. Some metrics don't apply to all features and that's why they're separated. Both tables have indexes on the date column.

When I try to do a query that joins them, if I specify a single date the query will return in about 3 seconds which seems reasonable to me. But if I simply do a.the_date between '8/1/2017' and '8/1/2017' the query now takes essentially forever. I kill it after 10 minutes because I'm impatient. Obviously PostgreSQL is no longer using the indexes.

Why is that and how can I get it to use the indexes again? I'd like to do queries like a.the_date between '7/1/2017' and '7/31/2017' or a.the_date >= '1/1/2017'. Right now I've simply got a python program running the query for one date at a time and then totaling the results, which seems like a hack. But at least it's predictable and much faster at 3-seconds-per-date.

The query:

```
select a.feature_a, a.feature_b, count(*) tot_ct,
count(case when b.br01 > 0 and a.bc01 0 and a.bc02 0 and a.bc03 0 and a.bc04 0 and a.bc05 0 and a.bc06 0 and a.bc07 0 and a.bc08 0 and a.bc09 0 and a.bc10 0 and a.bc11 0 and a.bc12 0 and a.bc13 0 and a.bc14 0 and a.bc15 0 and a.bc16 0 and a.bc17 0 and a.bc18 0 and a.bc19 0 and a.bc20 0 and a.bc21 0 and a.bc22 0 and a.bc23 0 and a.bc24 <= 10 then 1 end) cx24,
avg(b.br01) av01, avg(b.br02) av02, avg(b.br03) av03,
avg(b.br04) av04, avg(b.br05) av05, avg(b.br06) av06,
avg(b.br07) av07, avg(b.br08) av08, avg(b.br09) av09,
avg(b.br10) av10, avg(b.br11) av11, av

Solution

Solved my index issue, for anyone who might find this...

It was as simple as adding the date range to the where clause of both tables. Now both indexes get used and no more full table scans.

and a.the_date between '7/1/2017' and '7/31/2017'
and b.the_date between '7/1/2017' and '7/31/2017'
and a.the_date = b.the_date


Seems unnecessary, but I remember doing things like this many years ago when optimizing Oracle queries. Sometimes I'd even have to do things like a.col1 = b.col1 and a.col1 = b.col1 because having the = clause in there once wasn't enough.

Code Snippets

and a.the_date between '7/1/2017' and '7/31/2017'
and b.the_date between '7/1/2017' and '7/31/2017'
and a.the_date = b.the_date

Context

StackExchange Database Administrators Q#184817, answer score: 5

Revisions (0)

No revisions yet.