patternsqlModerate
Query detailing differences between rows for a large amount of data
Viewed 0 times
rowsamountquerydifferencesdetailingbetweenlargefordata
Problem
I have a number of large tables, each with >300 columns. The application I am using creates "archives" of changed rows by making a copy of the current row in a secondary table.
Consider a trivial example:
Archive table:
Before any updates are executed on
Therefore
I wish to create a report detailing the differences between rows, ordered by
At the moment I can't think of a sane way of doing this for a large amount of columns, without resorting to generating the queries programmatically - I may have to do this.
Open to lots of ideas, so I'll add a bounty to the question after 2 days.
Consider a trivial example:
CREATE TABLE dbo.bigtable
(
UpdateDate datetime,
PK varchar(12) PRIMARY KEY,
col1 varchar(100),
col2 int,
col3 varchar(20),
.
.
.
colN datetime
);Archive table:
CREATE TABLE dbo.bigtable_archive
(
UpdateDate datetime,
PK varchar(12) NOT NULL,
col1 varchar(100),
col2 int,
col3 varchar(20),
.
.
.
colN datetime
);Before any updates are executed on
dbo.bigtable, a copy of the row is created in dbo.bigtable_archive, then dbo.bigtable.UpdateDate is updated with the current date. Therefore
UNIONing the two tables together & grouping by PK creates a timeline of changes, when ordered by UpdateDate.I wish to create a report detailing the differences between rows, ordered by
UpdateDate, grouped by PK, in the following format:PK, UpdateDate, ColumnName, Old Value, New ValueOld Value and New Value can be the relevant columns cast to a VARCHAR(MAX) (there are no TEXT or BYTE columns involved), as I do not need to do any post-processing of the values themselves. At the moment I can't think of a sane way of doing this for a large amount of columns, without resorting to generating the queries programmatically - I may have to do this.
Open to lots of ideas, so I'll add a bounty to the question after 2 days.
Solution
This is not going to look pretty, especially given the more than 300 columns and unavailability of
The Transact-SQL of the above as I see it:
LAG, nor is it likely to perform exceedingly well, but just as something to start with, I would try the following approach:UNIONthe two tables.
- For each PK in the combined set, get its previous "incarnation" from the archive table (the implementation below uses
OUTER APPLY+TOP (1)as a poor man'sLAG).
- Cast each data column to
varchar(max)and unpivot them in pairs, i.e. the current and the previous value (CROSS APPLY (VALUES ...)works well for this operation).
- Finally, filter the results based on whether the values in each pair differ from each other.
The Transact-SQL of the above as I see it:
WITH
Combined AS
(
SELECT * FROM dbo.bigtable
UNION ALL
SELECT * FROM dbo.bigtable_archive
) AS derived,
OldAndNew AS
(
SELECT
this.*,
OldCol1 = last.Col1,
OldCol2 = last.Col2,
...
FROM
Combined AS this
OUTER APPLY
(
SELECT TOP (1)
*
FROM
dbo.bigtable_archive
WHERE
PK = this.PK
AND UpdateDate < this.UpdateDate
ORDER BY
UpdateDate DESC
) AS last
)
SELECT
t.PK,
t.UpdateDate,
x.ColumnName,
x.OldValue,
x.NewValue
FROM
OldAndNew AS t
CROSS APPLY
(
VALUES
('Col1', CAST(t.OldCol1 AS varchar(max), CAST(t.Col1 AS varchar(max))),
('Col2', CAST(t.OldCol2 AS varchar(max), CAST(t.Col2 AS varchar(max))),
...
) AS x (ColumnName, OldValue, NewValue)
WHERE
NOT EXISTS (SELECT x.OldValue INTERSECT x.NewValue)
ORDER BY
t.PK,
t.UpdateDate,
x.ColumnName
;Code Snippets
WITH
Combined AS
(
SELECT * FROM dbo.bigtable
UNION ALL
SELECT * FROM dbo.bigtable_archive
) AS derived,
OldAndNew AS
(
SELECT
this.*,
OldCol1 = last.Col1,
OldCol2 = last.Col2,
...
FROM
Combined AS this
OUTER APPLY
(
SELECT TOP (1)
*
FROM
dbo.bigtable_archive
WHERE
PK = this.PK
AND UpdateDate < this.UpdateDate
ORDER BY
UpdateDate DESC
) AS last
)
SELECT
t.PK,
t.UpdateDate,
x.ColumnName,
x.OldValue,
x.NewValue
FROM
OldAndNew AS t
CROSS APPLY
(
VALUES
('Col1', CAST(t.OldCol1 AS varchar(max), CAST(t.Col1 AS varchar(max))),
('Col2', CAST(t.OldCol2 AS varchar(max), CAST(t.Col2 AS varchar(max))),
...
) AS x (ColumnName, OldValue, NewValue)
WHERE
NOT EXISTS (SELECT x.OldValue INTERSECT x.NewValue)
ORDER BY
t.PK,
t.UpdateDate,
x.ColumnName
;Context
StackExchange Database Administrators Q#175000, answer score: 15
Revisions (0)
No revisions yet.