HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQLite writing a query where you select only rows nearest to the hour

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowstheyounearestquerywritingsqlitewherehourselect

Problem

I've got a set of data where data has been taken approximately every minute for about three months and the time has been stored as a UNIX timestamp. There is no regularity to the timestamp (i.e. the zero minute of the hour may not contain a reading, 00:59:55 and the next measurement could be 01:01:01) and days may be missing.

What I need is the row nearest to the hour, with the time-step rounding to the hour, as long as the nearest value is not more than 30 minutes away from the hour.

Where a matching hour could not be found it would be helpful if the query could include a time but no value.

I realize I am asking a lot, but this would be incredibly helpful.

By the way, the table is just PK(autoincrement), timestamp, value, sensor id(FK). I have tried this to get the data out:

SELECT strftime('%S', time, 'unixepoch'), 
       strftime('%M', time, 'unixepoch'), 
       strftime('%H', time, 'unixepoch'), 
       strftime('%d', time, 'unixepoch'),      
       strftime('%m', time, 'unixepoch'), 
       strftime('%Y', time, 'unixepoch'), 
       value 
FROM Timestream 
WHERE idSensor = 359;

Solution

Assuming that you want a SQL-only solution and your table is defined as:

CREATE TABLE measurements (
  pk INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  value TEXT NOT NULL,
  sensor_id INTEGER);


then create the following views:

CREATE VIEW vw_measurements AS
SELECT pk, timestamp, value, sensor_id,
       3600*ROUND((timestamp+1800)/3600) AS hour,
       ABS(timestamp - 3600*ROUND((timestamp+1800)/3600)) AS distance
FROM measurements;
CREATE VIEW min_distances AS
SELECT hour, MIN(distance) AS min_distance
FROM vw_measurements
GROUP BY hour;


Then you can get the records closest to every hour as:

SELECT m.hour, GROUP_CONCAT(m.timestamp), GROUP_CONCAT(m.value)
FROM vw_measurements m, min_distances d
WHERE m.hour = d.hour AND m.distance = d.min_distance
GROUP BY m.hour;


WARNING: SQLite SQL lacks many useful constructs that other SQL variants have, so the solution above can't discriminate between records with timestamps equidistant to their closest hour (either same timestamps or hour±samedelta); ergo the GROUP_CONCAT aggregate. Typically each row will have a single timestamp and value.

Note that if you have many thousands of records, the performance can be abysmal (tested on SQLite 3.7.9), and it might be better if min_distances becomes a temporary table.

Also: this solution does not fetch NULL values for hours without any related records (plus/minus 30 minutes).

Code Snippets

CREATE TABLE measurements (
  pk INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  value TEXT NOT NULL,
  sensor_id INTEGER);
CREATE VIEW vw_measurements AS
SELECT pk, timestamp, value, sensor_id,
       3600*ROUND((timestamp+1800)/3600) AS hour,
       ABS(timestamp - 3600*ROUND((timestamp+1800)/3600)) AS distance
FROM measurements;
CREATE VIEW min_distances AS
SELECT hour, MIN(distance) AS min_distance
FROM vw_measurements
GROUP BY hour;
SELECT m.hour, GROUP_CONCAT(m.timestamp), GROUP_CONCAT(m.value)
FROM vw_measurements m, min_distances d
WHERE m.hour = d.hour AND m.distance = d.min_distance
GROUP BY m.hour;

Context

StackExchange Database Administrators Q#37241, answer score: 2

Revisions (0)

No revisions yet.