patternsqlMinor
Find records with duplicate serial number
Viewed 0 times
numberwithduplicaterecordsserialfind
Problem
I was tasked today with cleaning up data from a SSRS report put together by a DB programmer. The original report was to find duplicate serial number sales within a user-defined period of time. I was given an Excel file with the result data set and it smelled really funny, so I decided to go into SSRS and find out what the DBA did.
Turns out the person who requested the report did not explain it very well, and our data services team was overworked, so the result set was bogus and the report took over 30 minutes to execute. It looked like a desperate attempt to make a self-
So I decided to rewrite it.
Background
-
A certain number of orders were processed as duplicates resulting in multiple coverage sales for the same unit, and invoicing problems etc.
-
We wanted to sort out manually cancelled orders (
-
The report below runs in about 3 minutes. I know nothing about XML (so please don't crucify me, and I didn't write that code either) but I suspect the hundreds and hundreds of lines of formatting after the SQL query might bog it down. Unfortunately I have no way to benchmark it against SSMS performance with just the SQL, as I only have access to SSRS. Any advice on cleaning up redundant/unnecessary XML is welcome.
So here goes. Any tips on performance, best practices, etc. appreciated!
Original report: approx 550 rows returned
My updated version: 120 rows returned (for May '14)
```
0
/Data Sources/Reporting
None
REMOVED
DataSource1
=Parameters!startdate.Value
=Parameters!enddate.Value
Turns out the person who requested the report did not explain it very well, and our data services team was overworked, so the result set was bogus and the report took over 30 minutes to execute. It looked like a desperate attempt to make a self-
join-ish operation and had SELECT DISTINCT * which threw a big red flag. So I decided to rewrite it.
Background
-
A certain number of orders were processed as duplicates resulting in multiple coverage sales for the same unit, and invoicing problems etc.
-
We wanted to sort out manually cancelled orders (
Status = 'CAN') from deleted orders (Status = 'DEL'), with the deleted ones being what we were looking for, as deleted orders don't show up in any of the invoicing, service, etc. applications so they are more difficult to track down (by design). -
The report below runs in about 3 minutes. I know nothing about XML (so please don't crucify me, and I didn't write that code either) but I suspect the hundreds and hundreds of lines of formatting after the SQL query might bog it down. Unfortunately I have no way to benchmark it against SSMS performance with just the SQL, as I only have access to SSRS. Any advice on cleaning up redundant/unnecessary XML is welcome.
So here goes. Any tips on performance, best practices, etc. appreciated!
Original report: approx 550 rows returned
My updated version: 120 rows returned (for May '14)
```
0
/Data Sources/Reporting
None
REMOVED
DataSource1
=Parameters!startdate.Value
=Parameters!enddate.Value
Solution
I noticed one suspicious join condition:
Temporary tables are rarely a good idea, I think. They involve a lot of I/O, and they force a complex query to be executed in a certain way, rather than letting the query optimizer decide the join order.
Instead, I suggest using Common Table Expressions to manage complex queries.
cdo.Data_Source = cdo.Data_Source. That should always be true, unless both sides are NULL. Did you mean to say cdo.Data_Source IS NOT NULL? Or was that join condition totally superfluous? (I'll assume the latter.)Temporary tables are rarely a good idea, I think. They involve a lot of I/O, and they force a complex query to be executed in a certain way, rather than letting the query optimizer decide the join order.
Instead, I suggest using Common Table Expressions to manage complex queries.
WITH Relevant_Dealers AS (
SELECT Dealer_Id
FROM Servicer_Reporting.dbo.Dealer_Handling
WHERE Account_ID IN (102,108,103,107,138,139,142,147,148,162,163,169,170)
), Relevant_Contracts AS ( -- Your #ContractDupSerial
SELECT cdo.Serial_Number
, cdo.Model_Number
, cmo.Contract_Number
, cmo.Dealer_ID
, cmo.Status
, cmo.Status_Date
, cmo.CSR_ID
FROM Servicer_Reporting.dbo.Contract_Master_Original AS cmo
INNER JOIN Servicer_Reporting.dbo.contract_detail_original AS cdo
ON cdo.Contract_Number = cmo.contract_Number
WHERE cmo.Dealer_Id IN (SELECT Dealer_Id FROM Relevant_Dealers)
AND cmo.Creation_Date >= @startdate
AND cmo.Creation_Date 1
), Canceled_Contracts AS (
SELECT Serial_Number
FROM Relevant_Contracts
WHERE CSR_ID IS NOT NULL
)
SELECT *
FROM Relevant_Contracts
WHERE Serial_Number IS NOT NULL
AND Serial_Number IN (SELECT Serial_Number FROM Dup_SN)
AND Serial_Number NOT IN (SELECT Serial_Number FROM Canceled_Contracts)
-- Clean up bogus data...
AND Serial_Number <> ''
AND Serial_Number <> '.'
AND Serial_Number NOT LIKE 'N/A%'
AND Serial_Number NOT LIKE 'NA%'
AND Serial_Number NOT LIKE 'X%'
AND Serial_Number NOT LIKE '0%';
ORDER BY Serial_Number ASC, Contract_Number ASC;Code Snippets
WITH Relevant_Dealers AS (
SELECT Dealer_Id
FROM Servicer_Reporting.dbo.Dealer_Handling
WHERE Account_ID IN (102,108,103,107,138,139,142,147,148,162,163,169,170)
), Relevant_Contracts AS ( -- Your #ContractDupSerial
SELECT cdo.Serial_Number
, cdo.Model_Number
, cmo.Contract_Number
, cmo.Dealer_ID
, cmo.Status
, cmo.Status_Date
, cmo.CSR_ID
FROM Servicer_Reporting.dbo.Contract_Master_Original AS cmo
INNER JOIN Servicer_Reporting.dbo.contract_detail_original AS cdo
ON cdo.Contract_Number = cmo.contract_Number
WHERE cmo.Dealer_Id IN (SELECT Dealer_Id FROM Relevant_Dealers)
AND cmo.Creation_Date >= @startdate
AND cmo.Creation_Date < @enddate
), Dup_SN AS (
SELECT Serial_Number
FROM Relevant_Contracts
GROUP BY Serial_Number
HAVING COUNT(*) > 1
), Canceled_Contracts AS (
SELECT Serial_Number
FROM Relevant_Contracts
WHERE CSR_ID IS NOT NULL
)
SELECT *
FROM Relevant_Contracts
WHERE Serial_Number IS NOT NULL
AND Serial_Number IN (SELECT Serial_Number FROM Dup_SN)
AND Serial_Number NOT IN (SELECT Serial_Number FROM Canceled_Contracts)
-- Clean up bogus data...
AND Serial_Number <> ''
AND Serial_Number <> '.'
AND Serial_Number NOT LIKE 'N/A%'
AND Serial_Number NOT LIKE 'NA%'
AND Serial_Number NOT LIKE 'X%'
AND Serial_Number NOT LIKE '0%';
ORDER BY Serial_Number ASC, Contract_Number ASC;Context
StackExchange Code Review Q#54676, answer score: 6
Revisions (0)
No revisions yet.