HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Consolidating a row of data, based on previous rows

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowspreviousconsolidatingbasedrowdata

Problem

I'm trying to build a history table out of an audit log (ultimately to build out a type 2 dimension table). Unfortunately, the audit log only records the specific fields being changes. Here's a rough example of what I'm talking about;

CREATE TABLE Staff(
  [ID] int, 
  [Surname] varchar(5), 
  [FirstName] varchar(4), 
  [Office] varchar(9), 
  [Date] varchar(10)
);

INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
  (001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
  (001, NULL, NULL, 'Sydney', '2015-03-01'),
  (002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
  (002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
  (002, NULL, NULL, 'Sydney', '2015-01-01'),
  (002, NULL, NULL, 'Perth', '2015-03-01');


The first entry for a particular staff member is for when their record is created, and each subsequent record is an update... but only shows the update to the field that was updated*. I want to "fill out" the update row with the rest of the employee record as it currently stands. ie, a result like this;

001, Smith, Bill, Melbourne, 2015-01-01
001, Smith, Bill, Sydney, 2015-03-01
002, Brown, Mary, Melbourne, 2014-04-01
002, Jones, Mary, Adelaide, 2014-05-01
002, Jones, Mary, Sydney, 2015-01-01
002, Jones, Mary, Perth, 2015-03-01


I know I can do this using a while loop or a cursor but I suspect there is probably a more performant option.

*A NULL always means "value didn't change" rather than "value changed to NULL".

Solution

Sample data with the Date column typed as date:

CREATE TABLE dbo.Staff
(
  [ID] integer NOT NULL, 
  [Surname] varchar(5) NULL, 
  [FirstName] varchar(4) NULL, 
  [Office] varchar(9) NULL, 
  [Date] date NOT NULL,

  PRIMARY KEY (ID, [Date])
);

INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
  (001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
  (001, NULL, NULL, 'Sydney', '2015-03-01'),
  (002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
  (002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
  (002, NULL, NULL, 'Sydney', '2015-01-01'),
  (002, NULL, NULL, 'Perth', '2015-03-01');


The idea of the following solution is to lag back as many rows from the current row as there are preceding nulls:

SELECT
    G.ID,
    Surname = LAG(G.Surname, G.SurnameLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    FirstName = LAG(G.FirstName, G.FirstNameLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    Office = LAG(G.Office, G.OfficeLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    G.[Date]
FROM 
(
    -- Find the LAG offset per column
    SELECT
        S.ID,
        S.Surname,
        SurnameLag = SUM(IIF(S.Surname IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.FirstName,
        FirstNameLag = SUM(IIF(S.FirstName IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.Office,
        OfficeLag = SUM(IIF(S.Office IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.[Date]
    FROM dbo.Staff AS S
) AS G
ORDER BY
    G.ID, G.[Date];


For versions of SQL Server prior to 2012, the IIF expressions can be written as CASE WHEN IS NULL THEN 1 ELSE 0 END.

Output:

This all became much easier since Microsoft added the IGNORE NULLS option in SQL Server 2022.

For more options, see The Last non NULL Puzzle by Itzik Ben-Gan.

Code Snippets

CREATE TABLE dbo.Staff
(
  [ID] integer NOT NULL, 
  [Surname] varchar(5) NULL, 
  [FirstName] varchar(4) NULL, 
  [Office] varchar(9) NULL, 
  [Date] date NOT NULL,

  PRIMARY KEY (ID, [Date])
);

INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
  (001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
  (001, NULL, NULL, 'Sydney', '2015-03-01'),
  (002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
  (002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
  (002, NULL, NULL, 'Sydney', '2015-01-01'),
  (002, NULL, NULL, 'Perth', '2015-03-01');
SELECT
    G.ID,
    Surname = LAG(G.Surname, G.SurnameLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    FirstName = LAG(G.FirstName, G.FirstNameLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    Office = LAG(G.Office, G.OfficeLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    G.[Date]
FROM 
(
    -- Find the LAG offset per column
    SELECT
        S.ID,
        S.Surname,
        SurnameLag = SUM(IIF(S.Surname IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.FirstName,
        FirstNameLag = SUM(IIF(S.FirstName IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.Office,
        OfficeLag = SUM(IIF(S.Office IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.[Date]
    FROM dbo.Staff AS S
) AS G
ORDER BY
    G.ID, G.[Date];

Context

StackExchange Database Administrators Q#148641, answer score: 7

Revisions (0)

No revisions yet.