patternMinor
Consolidating a row of data, based on previous rows
Viewed 0 times
rowspreviousconsolidatingbasedrowdata
Problem
I'm trying to build a history table out of an audit log (ultimately to build out a type 2 dimension table). Unfortunately, the audit log only records the specific fields being changes. Here's a rough example of what I'm talking about;
The first entry for a particular staff member is for when their record is created, and each subsequent record is an update... but only shows the update to the field that was updated*. I want to "fill out" the update row with the rest of the employee record as it currently stands. ie, a result like this;
I know I can do this using a
*A NULL always means "value didn't change" rather than "value changed to NULL".
CREATE TABLE Staff(
[ID] int,
[Surname] varchar(5),
[FirstName] varchar(4),
[Office] varchar(9),
[Date] varchar(10)
);
INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
(001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
(001, NULL, NULL, 'Sydney', '2015-03-01'),
(002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
(002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
(002, NULL, NULL, 'Sydney', '2015-01-01'),
(002, NULL, NULL, 'Perth', '2015-03-01');The first entry for a particular staff member is for when their record is created, and each subsequent record is an update... but only shows the update to the field that was updated*. I want to "fill out" the update row with the rest of the employee record as it currently stands. ie, a result like this;
001, Smith, Bill, Melbourne, 2015-01-01
001, Smith, Bill, Sydney, 2015-03-01
002, Brown, Mary, Melbourne, 2014-04-01
002, Jones, Mary, Adelaide, 2014-05-01
002, Jones, Mary, Sydney, 2015-01-01
002, Jones, Mary, Perth, 2015-03-01I know I can do this using a
while loop or a cursor but I suspect there is probably a more performant option.*A NULL always means "value didn't change" rather than "value changed to NULL".
Solution
Sample data with the
The idea of the following solution is to lag back as many rows from the current row as there are preceding nulls:
For versions of SQL Server prior to 2012, the
Output:
This all became much easier since Microsoft added the
For more options, see The Last non NULL Puzzle by Itzik Ben-Gan.
Date column typed as date:CREATE TABLE dbo.Staff
(
[ID] integer NOT NULL,
[Surname] varchar(5) NULL,
[FirstName] varchar(4) NULL,
[Office] varchar(9) NULL,
[Date] date NOT NULL,
PRIMARY KEY (ID, [Date])
);
INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
(001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
(001, NULL, NULL, 'Sydney', '2015-03-01'),
(002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
(002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
(002, NULL, NULL, 'Sydney', '2015-01-01'),
(002, NULL, NULL, 'Perth', '2015-03-01');The idea of the following solution is to lag back as many rows from the current row as there are preceding nulls:
SELECT
G.ID,
Surname = LAG(G.Surname, G.SurnameLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
FirstName = LAG(G.FirstName, G.FirstNameLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
Office = LAG(G.Office, G.OfficeLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
G.[Date]
FROM
(
-- Find the LAG offset per column
SELECT
S.ID,
S.Surname,
SurnameLag = SUM(IIF(S.Surname IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.FirstName,
FirstNameLag = SUM(IIF(S.FirstName IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.Office,
OfficeLag = SUM(IIF(S.Office IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.[Date]
FROM dbo.Staff AS S
) AS G
ORDER BY
G.ID, G.[Date];For versions of SQL Server prior to 2012, the
IIF expressions can be written as CASE WHEN IS NULL THEN 1 ELSE 0 END.Output:
This all became much easier since Microsoft added the
IGNORE NULLS option in SQL Server 2022.For more options, see The Last non NULL Puzzle by Itzik Ben-Gan.
Code Snippets
CREATE TABLE dbo.Staff
(
[ID] integer NOT NULL,
[Surname] varchar(5) NULL,
[FirstName] varchar(4) NULL,
[Office] varchar(9) NULL,
[Date] date NOT NULL,
PRIMARY KEY (ID, [Date])
);
INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
(001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
(001, NULL, NULL, 'Sydney', '2015-03-01'),
(002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
(002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
(002, NULL, NULL, 'Sydney', '2015-01-01'),
(002, NULL, NULL, 'Perth', '2015-03-01');SELECT
G.ID,
Surname = LAG(G.Surname, G.SurnameLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
FirstName = LAG(G.FirstName, G.FirstNameLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
Office = LAG(G.Office, G.OfficeLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
G.[Date]
FROM
(
-- Find the LAG offset per column
SELECT
S.ID,
S.Surname,
SurnameLag = SUM(IIF(S.Surname IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.FirstName,
FirstNameLag = SUM(IIF(S.FirstName IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.Office,
OfficeLag = SUM(IIF(S.Office IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.[Date]
FROM dbo.Staff AS S
) AS G
ORDER BY
G.ID, G.[Date];Context
StackExchange Database Administrators Q#148641, answer score: 7
Revisions (0)
No revisions yet.