patternsqlMinor
Inline Select Causing Incredibly slow query performance
Viewed 0 times
queryslowcausingperformanceinlineselectincredibly
Problem
I have a query that is used to populate an aggregate table for reporting purposes. The query comes from another developer here at the company I work for but is my job to make it run fast. So far all of my early attempts failed. I've tried several things with this query and this is where I am at the moment. I have shaved about a half an hour of it's load time already but am stuck and thinking I will probably just have to re-do the whole thing. I'm hoping someone on here can see what if I'm missing anything and give me some pointers on how to fix this query.
```
SELECT P.CompanyID,
P.CompanyName,
P.StoreID,
P.StoreName,
P.ReportDate,
Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
DimBusinessDateID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID,
DimBusinessDateID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID
AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE FT.DimStoreID = P.DimStoreID
AND FT.DimBusinessDateID = P.DimBusinessDateID
AND FT.ModStatusFlg <> 'D'), 0) AS StoreCash,
SR.CashDeposit AS StoreResp,
SN.StoreNet,
```
SELECT P.CompanyID,
P.CompanyName,
P.StoreID,
P.StoreName,
P.ReportDate,
Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
DimBusinessDateID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID,
DimBusinessDateID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID
AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE FT.DimStoreID = P.DimStoreID
AND FT.DimBusinessDateID = P.DimBusinessDateID
AND FT.ModStatusFlg <> 'D'), 0) AS StoreCash,
SR.CashDeposit AS StoreResp,
SN.StoreNet,
Solution
Okay so this is what I actually did to make this query that used to run in about an hour and a half run in under a minute. First I did a little more digging to see exactly what this was doing. There are a few major sub selects in the outer most portion of the query. They look like this.
These select statements are joining two of my largest tables. FactSalesTransaction(105 Million Records) to FactSalesPayment(102 Million records) and it is doing so in a select within a select. Which essentially means that for every row returned it is executing this query. Well this query is usually run for about seven days worth of data and therefore returns about 19,000 Records. That Mean's the 3 sub selects to these massive tables need to be executed 19,000 times. Bingo I think I've found where my performance loss was. So I switched those queries to a left join. Nothing complicated just made it so they only had to join once. The left join to replace looks like this.
As you can see I really didn't change much to the select itself just altered it to not run 19,000 times. The next thing I did was altered the query to a stored procedure where a take the date range the user or in this case the ETL process gives(usually 7 days back) Select from DimCalendar the DimCalendarID for that day so the query is using integers instead of date times and overall has less records to join. Making the final query look like this.
```
CREATE PROCEDURE [dbo].[NoneOfYourBusinessWhatINamedIt]
@ETLLoadDate DATETIME
AS
BEGIN
DECLARE @DimCal Int = (SELECT DimCalendarID FROM DimCalendar WHERE CalendarDate = @ETLLoadDate)
END
SELECT P.CompanyID,
P.CompanyName,
P.StoreID,
P.StoreName,
P.ReportDate,
FP.StoreCash,
SR.CashDeposit AS StoreResp,
SN.StoreNet,
P.DimEmployeeID AS EmpID,
P.EmpName,
P.RegisterID,
P.PullNumber,
ISNULL(SUM(PC.PullCash), 0) AS PullCash,
P.PullResp AS PullResp,
ISNULL(SUM(PN.PullNet),0) AS PullNet
FROM (SELECT C.CompanyID,
C.CompanyName,
S.StoreID,
S.StoreName,
F.DimEmployeeID,
E.FirstName + ' ' + E.LastName AS EmpName,
CASE
WHEN F.PullDrawerStartTime <> '1900-01-01' THEN F.PullDrawerStartTime
ELSE Isnull(Cast((SELECT TOP 1 Dateadd(SECOND, 1, PullDrawerEndTime)
FROM FactPullDrawer
WHERE PullDrawerEndTime 'D' AND DimBusinessDateID = @DimCal
GROUP BY DimStoreID,
DimBusinessDateID) AS SN
ON SN.DimStoreID =
Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
DimBusinessDateID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID,
DimBusinessDateID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID
AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE FT.DimStoreID = P.DimStoreID
AND FT.DimBusinessDateID = P.DimBusinessDateID
AND FT.ModStatusFlg <> 'D'), 0) AS StoreCashThese select statements are joining two of my largest tables. FactSalesTransaction(105 Million Records) to FactSalesPayment(102 Million records) and it is doing so in a select within a select. Which essentially means that for every row returned it is executing this query. Well this query is usually run for about seven days worth of data and therefore returns about 19,000 Records. That Mean's the 3 sub selects to these massive tables need to be executed 19,000 times. Bingo I think I've found where my performance loss was. So I switched those queries to a left join. Nothing complicated just made it so they only had to join once. The left join to replace looks like this.
LEFT JOIN (SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS StoreCash, FT.DimStoreID, FT.DimBusinessDateID
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
WHERE FT.ModStatusFlg <> 'D'
GROUP BY FT.DimStoreID, FT.DimBusinessDateID) AS FP
ON FP.DimStoreID = P.DimStoreID
AND FP.DimBusinessDateID = P.DimBusinessDateIDAs you can see I really didn't change much to the select itself just altered it to not run 19,000 times. The next thing I did was altered the query to a stored procedure where a take the date range the user or in this case the ETL process gives(usually 7 days back) Select from DimCalendar the DimCalendarID for that day so the query is using integers instead of date times and overall has less records to join. Making the final query look like this.
```
CREATE PROCEDURE [dbo].[NoneOfYourBusinessWhatINamedIt]
@ETLLoadDate DATETIME
AS
BEGIN
DECLARE @DimCal Int = (SELECT DimCalendarID FROM DimCalendar WHERE CalendarDate = @ETLLoadDate)
END
SELECT P.CompanyID,
P.CompanyName,
P.StoreID,
P.StoreName,
P.ReportDate,
FP.StoreCash,
SR.CashDeposit AS StoreResp,
SN.StoreNet,
P.DimEmployeeID AS EmpID,
P.EmpName,
P.RegisterID,
P.PullNumber,
ISNULL(SUM(PC.PullCash), 0) AS PullCash,
P.PullResp AS PullResp,
ISNULL(SUM(PN.PullNet),0) AS PullNet
FROM (SELECT C.CompanyID,
C.CompanyName,
S.StoreID,
S.StoreName,
F.DimEmployeeID,
E.FirstName + ' ' + E.LastName AS EmpName,
CASE
WHEN F.PullDrawerStartTime <> '1900-01-01' THEN F.PullDrawerStartTime
ELSE Isnull(Cast((SELECT TOP 1 Dateadd(SECOND, 1, PullDrawerEndTime)
FROM FactPullDrawer
WHERE PullDrawerEndTime 'D' AND DimBusinessDateID = @DimCal
GROUP BY DimStoreID,
DimBusinessDateID) AS SN
ON SN.DimStoreID =
Code Snippets
Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
DimBusinessDateID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID,
DimBusinessDateID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID
AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE FT.DimStoreID = P.DimStoreID
AND FT.DimBusinessDateID = P.DimBusinessDateID
AND FT.ModStatusFlg <> 'D'), 0) AS StoreCashLEFT JOIN (SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS StoreCash, FT.DimStoreID, FT.DimBusinessDateID
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
WHERE FT.ModStatusFlg <> 'D'
GROUP BY FT.DimStoreID, FT.DimBusinessDateID) AS FP
ON FP.DimStoreID = P.DimStoreID
AND FP.DimBusinessDateID = P.DimBusinessDateIDCREATE PROCEDURE [dbo].[NoneOfYourBusinessWhatINamedIt]
@ETLLoadDate DATETIME
AS
BEGIN
DECLARE @DimCal Int = (SELECT DimCalendarID FROM DimCalendar WHERE CalendarDate = @ETLLoadDate)
END
SELECT P.CompanyID,
P.CompanyName,
P.StoreID,
P.StoreName,
P.ReportDate,
FP.StoreCash,
SR.CashDeposit AS StoreResp,
SN.StoreNet,
P.DimEmployeeID AS EmpID,
P.EmpName,
P.RegisterID,
P.PullNumber,
ISNULL(SUM(PC.PullCash), 0) AS PullCash,
P.PullResp AS PullResp,
ISNULL(SUM(PN.PullNet),0) AS PullNet
FROM (SELECT C.CompanyID,
C.CompanyName,
S.StoreID,
S.StoreName,
F.DimEmployeeID,
E.FirstName + ' ' + E.LastName AS EmpName,
CASE
WHEN F.PullDrawerStartTime <> '1900-01-01' THEN F.PullDrawerStartTime
ELSE Isnull(Cast((SELECT TOP 1 Dateadd(SECOND, 1, PullDrawerEndTime)
FROM FactPullDrawer
WHERE PullDrawerEndTime < F.PullDrawerEndTime
AND DimStoreID = F.DimStoreID
AND DimRegisterID = F.DimRegisterID
AND DimBusinessDateID = F.DimBusinessDateID
--AND DimBusinessDateID = @DimCal
ORDER BY PullDrawerEndTime DESC) AS DATETIME), BD.CalendarDate + Isnull(Cast(Cast(ST.SiteSettingValue AS TIME) AS DATETIME), Cast('4:00:00 AM' AS DATETIME)))
END AS PullDrawerStartTime,
F.PullDrawerEndTime,
BD.CalendarDate AS ReportDate,
R.RegisterID,
R.DimRegisterID,
(SELECT Count(PullDrawerEndTime)
FROM FactPullDrawer
WHERE PullDrawerEndTime < F.PullDrawerEndTime
AND DimStoreID = F.DimStoreID
AND DimRegisterID = F.DimRegisterID
AND DimBusinessDateID = F.DimBusinessDateID) + 1 AS PullNumber,
Isnull(F.Amount, 0) AS PullResp,
F.DimStoreID,
F.DimBusinessDateID
FROM FactPullDrawer AS F
INNER JOIN DimCompany AS C
ON C.DimCompanyID = F.DimCompanyID
INNER JOIN DimStore AS S
ON S.DimStoreID = F.DimStoreID
INNER JOIN DimCalendar AS BD
ON BD.DimCalendarID = F.DimBusinessDateID
INNER JOIN DimEmployee AS E
ON F.DimEmployeeID = E.DimEmployeeID
INNER JOIN DimRegister AS R
ON R.DimRegisterID = F.DimRegisterID
LEFT JOIN DimSiteSettings AS ST
Context
StackExchange Database Administrators Q#29891, answer score: 7
Revisions (0)
No revisions yet.