HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Selecting only one duplicate

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
duplicateoneonlyselecting

Problem

I have a database of purchase orders and stockroom checkouts, and it seems that the stockroom checkouts tends to get duplicates inserted (we source this data from an older system nightly). What would be the select to get all rows from this table, but only select 1 of the duplicates at a time? I have around 100k duplicates, and my DB guy is still on christmas vacation :|

the tables do have unique keys, but other than the keys the data is duplicated a few thousand times. I'd like to avoid deleting the data (just in case). Server is running MSSQL2012, and i'll edit this post with a table structure when I get back to my desk.

Solution

You can use a CTE for this, if you want the row that's returned to be a complete, intact row rather than aggregates of any of the other columns. You can change the ORDER BY to prefer rows by any of the columns (the grouping is by the ones you think should be unique).

;WITH x AS
(
  SELECT col1, col2, col3, 
    rn = ROW_NUMBER() OVER 
    (
        PARTITION BY unique_columns 
        ORDER BY unique_columns, tie_breaker_if_you_care
    )
  FROM dbo.source_table
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;

Code Snippets

;WITH x AS
(
  SELECT col1, col2, col3, 
    rn = ROW_NUMBER() OVER 
    (
        PARTITION BY unique_columns 
        ORDER BY unique_columns, tie_breaker_if_you_care
    )
  FROM dbo.source_table
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;

Context

StackExchange Database Administrators Q#125133, answer score: 10

Revisions (0)

No revisions yet.