HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Checking for dupes on Compact Insert Statement

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
insertstatementcheckingdupesforcompact

Problem

I'm not sure if that is the correct term "Compact Insert Statement" It is just what I have always heard it referred to. It goes as follows:

INSERT INTO [tblUsers]
       ([username], [password]) 
VALUES ('user1', 'pass1'), 
       ('user2', 'pass2')


Anyway We have a table of just over 5M rows, and are about to import some data, BUT the probability is high that about 75% of the data is duplicated (we buy data from multiple sources, but they share around 30-40% with each of our sources :/).

If I do a unique constraint on the column, the entire insert fails from that point forward (unless wrapped in a transaction, of course).

I am just at a loss for how to do this efficiently and with code that can be reused going forward.

Solution

Why insert into tblusers directly at all?

I always use staging tables. You can use SSIS of course for the same result at with greater complexity

INSERT INTO [staging].[Users]
       ([username], [password]) 
VALUES ('user1', 'pass1'), 
       ('user2', 'pass2')

INSERT INTO [tblUsers]
       ([username], [password]) 
SELECT DISTINCT [username], [password] --edit, added DISTINCT 
FROM [staging].[Users] SU
WHERE NOT EXISTS (SELECT * FROM [tblUsers] U WHERE ...)

Code Snippets

INSERT INTO [staging].[Users]
       ([username], [password]) 
VALUES ('user1', 'pass1'), 
       ('user2', 'pass2')

INSERT INTO [tblUsers]
       ([username], [password]) 
SELECT DISTINCT [username], [password] --edit, added DISTINCT 
FROM [staging].[Users] SU
WHERE NOT EXISTS (SELECT * FROM [tblUsers] U WHERE ...)

Context

StackExchange Database Administrators Q#3210, answer score: 7

Revisions (0)

No revisions yet.