HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Quick way to validate two tables against each other

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
validatetableseachwayagainsttwoquickother

Problem

We're doing an ETL process. When all is said and done there are a bunch of tables that should be identical. What is the quickest way to verify that those tables (on two different servers) are in fact identical. I'm talking both schema and data.

Can I do a hash on the table it's self like I would be able to on an individual file or filegroup - to compare one to the other. We have Red-Gate data compare but since the tables in question contain millions of rows each I'd like something a little more performant.

One approach that intrigues me is this creative use of the union statement. But, I'd like to explore the hash idea a little further if possible.

POST ANSWER UPDATE

For any future vistors... here is the exact approach I ended up taking. It worked so well we're doing it on every table in each database. Thanks to answers below for pointing me in the right direction.

```
CREATE PROCEDURE [dbo].[usp_DatabaseValidation]
@TableName varchar(50)

AS
BEGIN

SET NOCOUNT ON;

-- parameter = if no table name was passed do them all, otherwise just check the one

-- create a temp table that lists all tables in target database

CREATE TABLE #ChkSumTargetTables ([fullname] varchar(250), [name] varchar(50), chksum int);
INSERT INTO #ChkSumTargetTables ([fullname], [name], [chksum])
SELECT DISTINCT
'[MyDatabase].[' + S.name + '].['
+ T.name + ']' AS [fullname],
T.name AS [name],
0 AS [chksum]
FROM MyDatabase.sys.tables T
INNER JOIN MyDatabase.sys.schemas S ON T.schema_id = S.schema_id
WHERE
T.name like IsNull(@TableName,'%');

-- create a temp table that lists all tables in source database

CREATE TABLE #ChkSumSourceTables ([fullname] varchar(250), [name] varchar(50), chksum int)
INSERT INTO #ChkSumSourceTables ([fullname], [name], [chksum])
SELECT DISTINCT
'[MyLinkedServer].[MyDatabase].[' + S.name + '].['

Solution

Here's what I've done before:

(SELECT 'TableA', * FROM TableA
EXCEPT
SELECT 'TableA', * FROM TableB)
UNION ALL
(SELECT 'TableB', * FROM TableB
EXCEPT
SELECT 'TableB', * FROM TableA)


It's worked well enough on tables that are about 1,000,000 rows, but I'm not sure how well that would work on extremely large tables.

Added:

I've run the query against my system which compares two tables with 21 fields of regular types in two different databases attached to the same server running SQL Server 2005. The table has about 3 million rows, and there's about 25000 rows different. The primary key on the table is weird, however, as it's a composite key of 10 fields (it's an audit table).

The execution plans for the queries has a total cost of 184.25879 for UNION and 184.22983 for UNION ALL. The tree cost only differs on the last step before returning rows, the concatenation.

Actually executing either query takes about 42s plus about 3s to actually transmit the rows. The time between the two queries is identical.

Second Addition:

This is actually extremely fast, each one running against 3 million rows in about 2.5s:

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableA

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableB


If the results of those don't match, you know the tables are different. However, if the results do match, you're not guaranteed that the tables are identical because of the [highly unlikely] chance of checksum collisions.

I'm not sure how datatype changes between tables would affect this calculation. I would run the query against the system views or information_schema views.

I tried the query against another table with 5 million rows and that one ran in about 5s, so it appears to be largely O(n).

Code Snippets

(SELECT 'TableA', * FROM TableA
EXCEPT
SELECT 'TableA', * FROM TableB)
UNION ALL
(SELECT 'TableB', * FROM TableB
EXCEPT
SELECT 'TableB', * FROM TableA)
SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableA

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableB

Context

StackExchange Database Administrators Q#34356, answer score: 19

Revisions (0)

No revisions yet.