patternsqlMinor
Computing Matching Data Points from Fuzzy Timestamps in Postgresql
Viewed 0 times
postgresqltimestampspointsfromdatamatchingfuzzycomputing
Problem
I have a table containing multiple time series of different types. The timestamps of cohesive points from different series do not match exactly (i. e. the difference can be up to an hour).
Schema
Below is the schema with two example series:
Goal
The goal is to select one series together with the closest datapoint from another series. For the example dataset the result should be:
First working approach
My current approach is to select the best matching data point from the other series by a subquery:
This does not seem to be the best approach as the dataset is quite huge and thus performing many subqueries slows down th
Schema
Below is the schema with two example series:
CREATE TABLE series (id integer, series_type integer, charttime timestamp,
value integer, PRIMARY KEY (id));
INSERT INTO series VALUES (1, 1, '2018-03-01 12:10:00', 40),
(2, 1, '2018-03-01 13:25:00', 30), (3, 1, '2018-03-01 14:10:00', 50);
INSERT INTO series VALUES (4, 2, '2018-03-01 11:20:00', 2), (5, 2, '2018-03-01 12:15:00', 6),
(6, 2, '2018-03-01 13:00:00', 7), (7, 2, '2018-03-01 13:45:00', 1);
id |series_type |charttime |value |
---|------------|--------------------|------|
1 |1 |2018-03-01 12:10:00 |40 |
2 |1 |2018-03-01 13:25:00 |30 |
3 |1 |2018-03-01 14:10:00 |50 |
4 |2 |2018-03-01 11:20:00 |2 |
5 |2 |2018-03-01 12:15:00 |6 |
7 |2 |2018-03-01 13:45:00 |1 |
6 |2 |2018-03-01 13:00:00 |7 |Goal
The goal is to select one series together with the closest datapoint from another series. For the example dataset the result should be:
charttime |s1 |s2 |
--------------------|---|---|
2018-03-01 12:10:00 |40 |6 |
2018-03-01 13:25:00 |30 |1 |
2018-03-01 14:10:00 |50 |1 |First working approach
My current approach is to select the best matching data point from the other series by a subquery:
SELECT l.charttime, l.value AS s1,
( SELECT r.value
FROM series r
WHERE ABS( EXTRACT( EPOCH FROM l.charttime - r.charttime ) / 3600 ) < 1
AND r.series_type = 2
ORDER BY ABS( EXTRACT( EPOCH FROM l.charttime - r.charttime )) ASC LIMIT 1
) AS s2
FROM series l
WHERE l.series_type = 1
ORDER BY l.charttime ASCThis does not seem to be the best approach as the dataset is quite huge and thus performing many subqueries slows down th
Solution
In your second approach, with the self join, you could remove duplicates using
Partition by l.charttime, order by the time difference and filter for row_number = 1.
I think performance will be horrible, however. Because of the cartesian join this will be an O(size(series 1) x size(series 2)) operation.
Having both l.charttime and r.charttime inside functions may also be causing trouble. Try refactoring to (in pseudo code)
.. and see how the query plan looks. I presume there's an index on charttime. Without one no approach will be fast. Indeed, two partial indexes, one on Series 1 and one on Series 2 may be even better.
row_number(),Partition by l.charttime, order by the time difference and filter for row_number = 1.
I think performance will be horrible, however. Because of the cartesian join this will be an O(size(series 1) x size(series 2)) operation.
Having both l.charttime and r.charttime inside functions may also be causing trouble. Try refactoring to (in pseudo code)
r.charttime l.charttime - 3600.. and see how the query plan looks. I presume there's an index on charttime. Without one no approach will be fast. Indeed, two partial indexes, one on Series 1 and one on Series 2 may be even better.
Code Snippets
r.charttime < l.charttime + 3600
and r.charttime > l.charttime - 3600Context
StackExchange Database Administrators Q#211331, answer score: 4
Revisions (0)
No revisions yet.