snippetsqlMinor
How do I merge similar records with different validity dates?
Viewed 0 times
mergewithrecordsdatesdifferentvalidityhowsimilar
Problem
The table I am working on has three components:
Values:
The table is updated by taking "snapshots" of another data source at some intervals and assigning validity dates to records. The problem is that these snapshots create duplicate entries for records (with different validity dates) that were not changed at all during that interval.
I want to reduce the size of the table by looking for rows with consecutive dates and by merging them and assigning them a single validity period. For example:
The logic I currently have is:
I understand that cursors are very inefficient (I have a large dataset), so I am looking for other approaches.
- An
IDcolumn (primary key in another table)
- Some data columns
- Date valid
from/tocolumns.
Values:
ID Data From To
1 a 2015-01-01 2015-01-05
1 a 2015-01-06 2015-01-10
1 b 2015-01-11 2015-01-15
1 a 2015-01-16 2015-01-20
2 c 2015-01-01 2015-01-05
2 c 2015-01-06 2015-01-10The table is updated by taking "snapshots" of another data source at some intervals and assigning validity dates to records. The problem is that these snapshots create duplicate entries for records (with different validity dates) that were not changed at all during that interval.
I want to reduce the size of the table by looking for rows with consecutive dates and by merging them and assigning them a single validity period. For example:
ID Data From To
1 a 2015-01-01 2015-01-10
1 b 2015-01-11 2015-01-15
1 a 2015-01-16 2015-01-20
2 c 2015-01-01 2015-01-10The logic I currently have is:
- Select and sort all rows by ID, data fields, and 'valid from' fields (so they are in groups of consecutive rows).
- Use a cursor to compare adjacent rows for similarity.
- If they are same, then merge rows and change validity period to include both rows.
I understand that cursors are very inefficient (I have a large dataset), so I am looking for other approaches.
Solution
If this is a table of back-to-back ranges only, your case can be treated as a classic "gaps and islands" problem, where you just need to isolate islands of consecutive ranges and then "condense" them by taking the minimum
There is an established method of solving this using two ROW_NUMBER calls:
This query will work in as low version as SQL Server 2005.
[from] and the maximum [to] per island.There is an established method of solving this using two ROW_NUMBER calls:
WITH islands AS
(
SELECT
id,
data,
[from],
[to],
island = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [from])
- ROW_NUMBER() OVER (PARTITION BY id, data ORDER BY [from])
FROM
#mergeTest
)
SELECT
id,
data,
[from] = MIN([from]),
[to] = MAX([to])
FROM
islands
GROUP BY
id,
data,
island
;This query will work in as low version as SQL Server 2005.
Code Snippets
WITH islands AS
(
SELECT
id,
data,
[from],
[to],
island = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [from])
- ROW_NUMBER() OVER (PARTITION BY id, data ORDER BY [from])
FROM
#mergeTest
)
SELECT
id,
data,
[from] = MIN([from]),
[to] = MAX([to])
FROM
islands
GROUP BY
id,
data,
island
;Context
StackExchange Database Administrators Q#107508, answer score: 9
Revisions (0)
No revisions yet.