snippetsqlMinor
Most efficient way to generate a diff
Viewed 0 times
efficientwaygeneratediffmost
Problem
I have a table in SQL server that looks like this:
I am working on a stored procedure to diff, that takes input data and a version number.
The input data has columns from Name uptil fieldZ. Most of the field columns are expected to be NULL, i.e., each row usually has data for only the first few fields, the rest are NULL. The name, date and version form a unique constraint on the table.
I need to diff the data that is input with respect to this table, for a given version. Each row needs to be diffed - a row is identified by the name, date and version, and any change in any of the values in the field columns will need to show in the diff.
Update: all the fields need not be of type decimal. Some of them may be nvarchars. I would prefer the diff to happen without converting the type, although the diff output could convert everything to nvarchar since it is to be used only for display purposed.
Suppose the input is the following, and the requested version is 2,:
The diff needs to be in the following format:
My solution so far is to first generate a diff, using EXCEPT and UNION.
Then convert the diff to the desired output format using a JOIN and CROSS APPL
Id |Version |Name |date |fieldA |fieldB ..|fieldZ
1 |1 |Foo |20120101|23 | ..|25334123
2 |2 |Foo |20120101|23 |NULL ..|NULL
3 |2 |Bar |20120303|24 |123......|NULL
4 |2 |Bee |20120303|34 |-34......|NULLI am working on a stored procedure to diff, that takes input data and a version number.
The input data has columns from Name uptil fieldZ. Most of the field columns are expected to be NULL, i.e., each row usually has data for only the first few fields, the rest are NULL. The name, date and version form a unique constraint on the table.
I need to diff the data that is input with respect to this table, for a given version. Each row needs to be diffed - a row is identified by the name, date and version, and any change in any of the values in the field columns will need to show in the diff.
Update: all the fields need not be of type decimal. Some of them may be nvarchars. I would prefer the diff to happen without converting the type, although the diff output could convert everything to nvarchar since it is to be used only for display purposed.
Suppose the input is the following, and the requested version is 2,:
Name |date |fieldA |fieldB|..|fieldZ
Foo |20120101|25 |NULL |.. |NULL
Foo |20120102|26 |27 |.. |NULL
Bar |20120303|24 |126 |.. |NULL
Baz |20120101|15 |NULL |.. |NULLThe diff needs to be in the following format:
name |date |field |oldValue |newValue
Foo |20120101|FieldA |23 |25
Foo |20120102|FieldA |NULL |26
Foo |20120102|FieldB |NULL |27
Bar |20120303|FieldB |123 |126
Baz |20120101|FieldA |NULL |15My solution so far is to first generate a diff, using EXCEPT and UNION.
Then convert the diff to the desired output format using a JOIN and CROSS APPL
Solution
Edit regarding fields having different types, not just
You can try to use
By the way, it is a bad idea to store dates as
Instead of using
I would start with unpivoting the data, rather than doing it last (using
You can get rid of unpivoting of the input, if you do it in advance, on the caller side.
You would have to list all 100 columns only in
The final query is pretty simple, so temp table is not really needed. I think it is easier to write and maintain than your version.
Here is SQL Fiddle.
Set up sample data
Main query
Result
decimal.You can try to use
sql_variant type. I never used it personally, but it may be a good solution for your case. To try it just replace all decimal with sql_variant in the SQL script. The query itself remains exactly as it is, no explicit conversion is needed for performing the comparison. The end result would have a column with values of different types in it. Most likely, eventually you would have to know somehow which type is in which field to process the results in your application, but the query itself should work fine without conversions.By the way, it is a bad idea to store dates as
int.Instead of using
EXCEPT and UNION to calculate the diff, I'd use FULL JOIN. For me, personally, it is difficult to follow the logic behind EXCEPT and UNION approach.I would start with unpivoting the data, rather than doing it last (using
CROSS APPLY(VALUES) as you do).You can get rid of unpivoting of the input, if you do it in advance, on the caller side.
You would have to list all 100 columns only in
CROSS APPLY(VALUES).The final query is pretty simple, so temp table is not really needed. I think it is easier to write and maintain than your version.
Here is SQL Fiddle.
Set up sample data
DECLARE @TMain TABLE (
[ID] [int] NOT NULL,
[Version] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[dt] [date] NOT NULL,
[FieldA] [decimal](38, 10) NULL,
[FieldB] [decimal](38, 10) NULL,
[FieldZ] [decimal](38, 10) NULL
);
INSERT INTO @TMain ([ID],[Version],[Name],[dt],[FieldA],[FieldB],[FieldZ]) VALUES
(1,1,'Foo','20120101',23,23 ,25334123),
(2,2,'Foo','20120101',23,NULL,NULL),
(3,2,'Bar','20120303',24,123 ,NULL),
(4,2,'Bee','20120303',34,-34 ,NULL);
DECLARE @TInput TABLE (
[Name] [nvarchar](50) NOT NULL,
[dt] [date] NOT NULL,
[FieldA] [decimal](38, 10) NULL,
[FieldB] [decimal](38, 10) NULL,
[FieldZ] [decimal](38, 10) NULL
);
INSERT INTO @TInput ([Name],[dt],[FieldA],[FieldB],[FieldZ]) VALUES
('Foo','20120101',25,NULL,NULL),
('Foo','20120102',26,27 ,NULL),
('Bar','20120303',24,126 ,NULL),
('Baz','20120101',15,NULL,NULL);
DECLARE @VarVersion int = 2;Main query
CTE_Main is unpivoted original data filtered to the given Version. CTE_Input is input table, which could be provided already in this format. Main query uses FULL JOIN, which adds to result rows with Bee. I think they should be returned, but if you don't want to see them, you can filter them out by adding AND CTE_Input.FieldValue IS NOT NULL or maybe using LEFT JOIN instead of FULL JOIN, I didn't look into details there, because I think they should be returned.WITH
CTE_Main
AS
(
SELECT
Main.ID
,Main.Version
,Main.Name
,Main.dt
,FieldName
,FieldValue
FROM
@TMain AS Main
CROSS APPLY
(
VALUES
('FieldA', Main.FieldA),
('FieldB', Main.FieldB),
('FieldZ', Main.FieldZ)
) AS CA(FieldName, FieldValue)
WHERE
Main.Version = @VarVersion
)
,CTE_Input
AS
(
SELECT
Input.Name
,Input.dt
,FieldName
,FieldValue
FROM
@TInput AS Input
CROSS APPLY
(
VALUES
('FieldA', Input.FieldA),
('FieldB', Input.FieldB),
('FieldZ', Input.FieldZ)
) AS CA(FieldName, FieldValue)
)
SELECT
ISNULL(CTE_Main.Name, CTE_Input.Name) AS FullName
,ISNULL(CTE_Main.dt, CTE_Input.dt) AS FullDate
,ISNULL(CTE_Main.FieldName, CTE_Input.FieldName) AS FullFieldName
,CTE_Main.FieldValue AS OldValue
,CTE_Input.FieldValue AS NewValue
FROM
CTE_Main
FULL JOIN CTE_Input ON
CTE_Input.Name = CTE_Main.Name
AND CTE_Input.dt = CTE_Main.dt
AND CTE_Input.FieldName = CTE_Main.FieldName
WHERE
(CTE_Main.FieldValue <> CTE_Input.FieldValue)
OR (CTE_Main.FieldValue IS NULL AND CTE_Input.FieldValue IS NOT NULL)
OR (CTE_Main.FieldValue IS NOT NULL AND CTE_Input.FieldValue IS NULL)
--ORDER BY FullName, FullDate, FullFieldName;Result
FullName FullDate FullFieldName OldValue NewValue
Foo 2012-01-01 FieldA 23.0000000000 25.0000000000
Foo 2012-01-02 FieldA NULL 26.0000000000
Foo 2012-01-02 FieldB NULL 27.0000000000
Bar 2012-03-03 FieldB 123.0000000000 126.0000000000
Baz 2012-01-01 FieldA NULL 15.0000000000
Bee 2012-03-03 FieldB -34.0000000000 NULL
Bee 2012-03-03 FieldA 34.0000000000 NULLCode Snippets
DECLARE @TMain TABLE (
[ID] [int] NOT NULL,
[Version] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[dt] [date] NOT NULL,
[FieldA] [decimal](38, 10) NULL,
[FieldB] [decimal](38, 10) NULL,
[FieldZ] [decimal](38, 10) NULL
);
INSERT INTO @TMain ([ID],[Version],[Name],[dt],[FieldA],[FieldB],[FieldZ]) VALUES
(1,1,'Foo','20120101',23,23 ,25334123),
(2,2,'Foo','20120101',23,NULL,NULL),
(3,2,'Bar','20120303',24,123 ,NULL),
(4,2,'Bee','20120303',34,-34 ,NULL);
DECLARE @TInput TABLE (
[Name] [nvarchar](50) NOT NULL,
[dt] [date] NOT NULL,
[FieldA] [decimal](38, 10) NULL,
[FieldB] [decimal](38, 10) NULL,
[FieldZ] [decimal](38, 10) NULL
);
INSERT INTO @TInput ([Name],[dt],[FieldA],[FieldB],[FieldZ]) VALUES
('Foo','20120101',25,NULL,NULL),
('Foo','20120102',26,27 ,NULL),
('Bar','20120303',24,126 ,NULL),
('Baz','20120101',15,NULL,NULL);
DECLARE @VarVersion int = 2;WITH
CTE_Main
AS
(
SELECT
Main.ID
,Main.Version
,Main.Name
,Main.dt
,FieldName
,FieldValue
FROM
@TMain AS Main
CROSS APPLY
(
VALUES
('FieldA', Main.FieldA),
('FieldB', Main.FieldB),
('FieldZ', Main.FieldZ)
) AS CA(FieldName, FieldValue)
WHERE
Main.Version = @VarVersion
)
,CTE_Input
AS
(
SELECT
Input.Name
,Input.dt
,FieldName
,FieldValue
FROM
@TInput AS Input
CROSS APPLY
(
VALUES
('FieldA', Input.FieldA),
('FieldB', Input.FieldB),
('FieldZ', Input.FieldZ)
) AS CA(FieldName, FieldValue)
)
SELECT
ISNULL(CTE_Main.Name, CTE_Input.Name) AS FullName
,ISNULL(CTE_Main.dt, CTE_Input.dt) AS FullDate
,ISNULL(CTE_Main.FieldName, CTE_Input.FieldName) AS FullFieldName
,CTE_Main.FieldValue AS OldValue
,CTE_Input.FieldValue AS NewValue
FROM
CTE_Main
FULL JOIN CTE_Input ON
CTE_Input.Name = CTE_Main.Name
AND CTE_Input.dt = CTE_Main.dt
AND CTE_Input.FieldName = CTE_Main.FieldName
WHERE
(CTE_Main.FieldValue <> CTE_Input.FieldValue)
OR (CTE_Main.FieldValue IS NULL AND CTE_Input.FieldValue IS NOT NULL)
OR (CTE_Main.FieldValue IS NOT NULL AND CTE_Input.FieldValue IS NULL)
--ORDER BY FullName, FullDate, FullFieldName;FullName FullDate FullFieldName OldValue NewValue
Foo 2012-01-01 FieldA 23.0000000000 25.0000000000
Foo 2012-01-02 FieldA NULL 26.0000000000
Foo 2012-01-02 FieldB NULL 27.0000000000
Bar 2012-03-03 FieldB 123.0000000000 126.0000000000
Baz 2012-01-01 FieldA NULL 15.0000000000
Bee 2012-03-03 FieldB -34.0000000000 NULL
Bee 2012-03-03 FieldA 34.0000000000 NULLContext
StackExchange Database Administrators Q#111223, answer score: 5
Revisions (0)
No revisions yet.