HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimizing a query which runs through all hours in a day

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
allhoursqueryoptimizingthroughwhichrunsday

Problem

I need help optimizing a query. I'm using PostgreSQL 9.3.4 currently but can upgrade to 9.4 if needed.

I have a table with 60+ millions of records which looks like this:

Table "public.snapshots"
   Column   |           Type           |                         Modifiers                          | Storage  | Stats target | Description 
------------+--------------------------+------------------------------------------------------------+----------+--------------+-------------
 id         | integer                  | not null default nextval('snapshots_new_id_seq'::regclass) | plain    |              | 
 camera_id  | integer                  | not null                                                   | plain    |              | 
 created_at | timestamp with time zone | not null                                                   | plain    |              | 
 notes      | text                     |                                                            | extended |              | 
 data       | bytea                    | not null                                                   | extended |              | 
 is_public  | boolean                  | not null default false                                     | plain    |              | 
Indexes:
    "snapshots_new_created_at_camera_id_index" UNIQUE, btree (created_at, camera_id)
Foreign-key constraints:
    "snapshots_new_camera_id_fkey" FOREIGN KEY (camera_id) REFERENCES cameras(id) ON DELETE CASCADE
Has OIDs: no


There can be between 0 and 3600 snapshot records in an hour. For this query I'm interested only in knowing which hours in a given day for a given camera_id have 1 or more snapshot records (the actual count is unimportant).

Currently the application is implemented to execute one query for each hour in the day, like this:

```
SELECT count(*) AS "count" FROM "snapshots" WHERE (("snapshots"."camera_id" = 4809) AND ("created_at" >= '2015-05-24 23:00:00 UTC') AND ("created_at" = '2015-05-25 00:00:00 UTC

Solution

Just an idea: create a table with all hours in the day (0-23).

create table hours(
  hr integer
);


Then find all hours that have snapshots for the given camera_id and date (of course, you will need to substitute your own camera_id and date into the query):

select h.hr, 1 as camera_id
from hours h
where exists (
    select 1 
    from snapshots s
    where s.camera_id = 1
    and s.created_at between to_timestamp ('2015-05-01 ' || to_char(h.hr, '00') || ':00:00', 'YYYY-MM-DD HH24:MI:SS')
                             and to_timestamp ('2015-05-01 ' || to_char(h.hr, '00') || ':59:59', 'YYYY-MM-DD HH24:MI:SS')
)


See this on SQLFiddle. To return a count(*) from the table the database has to go through all the rows that satisfy the condition to count them; exists() should limit this to finding the first row that satisfies the condition.

Code Snippets

create table hours(
  hr integer
);
select h.hr, 1 as camera_id
from hours h
where exists (
    select 1 
    from snapshots s
    where s.camera_id = 1
    and s.created_at between to_timestamp ('2015-05-01 ' || to_char(h.hr, '00') || ':00:00', 'YYYY-MM-DD HH24:MI:SS')
                             and to_timestamp ('2015-05-01 ' || to_char(h.hr, '00') || ':59:59', 'YYYY-MM-DD HH24:MI:SS')
)

Context

StackExchange Database Administrators Q#103287, answer score: 3

Revisions (0)

No revisions yet.