snippetMinor
How can I make this aggregation query more efficient?
Viewed 0 times
thiscanquerymakemoreefficienthowaggregation
Problem
I have a query that's taking quite a long time on my machine (7 minutes) to execute and I was wondering whether I could make it (significantly) quicker:
What it's doing is extracting data to be put in an Excel spreadsheet report (whether a spreadsheet is a good presentation of this data is outside the scope of this question :-) )
The first subquery pulls out records optionally filtered by a list of classification IDs. These are then cross joined with the user table, because each user table row actually contains the info we really need for this: the user's organization name. I then left join the maintenance job history table (storing an entry for each record download) in order to create multiple rows if a record has been accessed multiple times, then group
SELECT
rec.[Id] AS RecordId,
MIN(rec.[CreationDate]) AS RecordCreationDate,
MIN(rec.[LastModified]) AS RecordLastModified,
MIN(rec.[AssetType]) AS RecordAssetType,
MIN(rec.[MasterFilename]) AS RecordMasterFilename,
MIN(rec.[GameName]) AS RecordGameName,
usr.[OrganizationName],
COUNT(hist.[Id]) AS TimesDownloaded
FROM
(
SELECT
innerRec.Id,
MIN(innerRec.CreationDate) AS CreationDate,
MIN(innerRec.LastModified) AS LastModified,
MIN(innerRec.AssetType) AS AssetType,
MIN(innerRec.MasterFilename) AS MasterFilename,
MIN(innerRec.GameName) AS GameName
FROM
[dbo].[Record] innerRec INNER JOIN [dbo].[RecordClassificationLink] innerLnk ON innerRec.Id = innerLnk.RecordId
-- WHERE (classification ID is foo or bar)
GROUP BY
innerRec.Id
-- HAVING COUNT(innerLnk.ClassificationId) = (number of specified classifications)
) rec
CROSS JOIN
[dbo].[AdamUser] usr
LEFT JOIN
(SELECT * FROM [dbo].[MaintenanceJobHistory] WHERE [CreatedOn] > '2016-01-01 00:00:00' AND [CreatedOn] < '2016-12-01 00:00:00') hist ON usr.Name = hist.AccessingUser AND rec.Id = hist.RecordId
GROUP BY
rec.Id, usr.OrganizationNameWhat it's doing is extracting data to be put in an Excel spreadsheet report (whether a spreadsheet is a good presentation of this data is outside the scope of this question :-) )
The first subquery pulls out records optionally filtered by a list of classification IDs. These are then cross joined with the user table, because each user table row actually contains the info we really need for this: the user's organization name. I then left join the maintenance job history table (storing an entry for each record download) in order to create multiple rows if a record has been accessed multiple times, then group
Solution
The stuff that is probably taking a lot of time is a large number of
Here are some index suggestions that I think would get you started:
Then, you could modify your query a little bit to help the optimizer make some smart choices, like aggregating certain data streams before they're joined and create a much larger product that will take more time to aggregate:
I've done the following:
In my mind, that should give you the same result, but it's late here, so you'll have to verify the results yourself. :)
Here's my query plan:
EDIT: You could also simplify the
... and the plan looks a tiny bit better as well (look for the red
Sort operations in your query plan. You can preempt those by sorting the data yourself, in the form of indexes.Here are some index suggestions that I think would get you started:
CREATE INDEX IX_RecordId
ON dbo.RecordClassificationLink (RecordId);
CREATE INDEX IX_MaintenanceJobHistory_ByUser
ON dbo.MaintenanceJobHistory (AccessingUser, RecordId);
CREATE UNIQUE INDEX UIX_AdamUser_OrgnName
ON dbo.AdamUser (OrganizationName, Name);Then, you could modify your query a little bit to help the optimizer make some smart choices, like aggregating certain data streams before they're joined and create a much larger product that will take more time to aggregate:
SELECT
rec.[Id] AS RecordId,
MIN(rec.[CreationDate]) AS RecordCreationDate,
MIN(rec.[LastModified]) AS RecordLastModified,
MIN(rec.[AssetType]) AS RecordAssetType,
MIN(rec.[MasterFilename]) AS RecordMasterFilename,
MIN(rec.[GameName]) AS RecordGameName,
usr.[OrganizationName],
SUM(ISNULL(hist._count, 0)) AS TimesDownloaded
FROM
(
SELECT
innerRec.Id,
MIN(innerRec.CreationDate) AS CreationDate,
MIN(innerRec.LastModified) AS LastModified,
MIN(innerRec.AssetType) AS AssetType,
MIN(innerRec.MasterFilename) AS MasterFilename,
MIN(innerRec.GameName) AS GameName
FROM [dbo].[Record] innerRec
INNER JOIN [dbo].[RecordClassificationLink] innerLnk ON innerRec.Id = innerLnk.RecordId
-- WHERE (classification ID is foo or bar)
GROUP BY
innerRec.Id
-- HAVING COUNT(innerLnk.ClassificationId) = (number of specified classifications)
) AS rec
CROSS JOIN [dbo].[AdamUser] AS usr
LEFT JOIN
(SELECT AccessingUser, RecordId, COUNT(*) AS _count
FROM [dbo].[MaintenanceJobHistory]
WHERE [CreatedOn] > '2016-01-01 00:00:00' AND
[CreatedOn] < '2016-12-01 00:00:00'
GROUP BY AccessingUser, RecordId
) AS hist ON
usr.Name = hist.AccessingUser AND
rec.Id = hist.RecordId
GROUP BY
rec.Id, usr.OrganizationName;I've done the following:
- the
histsubquery is aggregated onAccessingUser, RecordId, and I've created aCOUNT(*) AS _count. This query uses the new indexIX_MaintenanceJobHistory_ByUserto perform very efficiently without any memory grant or hash tables.
- Because of this, I've replaced
COUNT(hist.Id)withSUM(ISNULL(hist._count, 0)) AS TimesDownloaded
- The new index on
dbo.RecordClassificationLinkhelps perform a smooth join with theRecordtable, but if you add yourWHEREandHAVING, that index won't help you.
- The new index on
dbo.AdamUseralso improves performance by eliminating a Sort operator - because you aggregate on theOrganizationNamecolumn, might as well have your data sorted on this from the get-go.
In my mind, that should give you the same result, but it's late here, so you'll have to verify the results yourself. :)
Here's my query plan:
EDIT: You could also simplify the
rec part a bit - probably be a bit easier to read:SELECT
rec.[Id] AS RecordId,
rec.[CreationDate] AS RecordCreationDate,
rec.[LastModified] AS RecordLastModified,
rec.[AssetType] AS RecordAssetType,
rec.[MasterFilename] AS RecordMasterFilename,
rec.[GameName] AS RecordGameName,
usr.[OrganizationName],
SUM(ISNULL(hist._count, 0)) AS TimesDownloaded
FROM
(
SELECT
Id,
CreationDate,
LastModified,
AssetType,
MasterFilename,
GameName
FROM [dbo].[Record]
WHERE Id IN (
SELECT RecordId
FROM [dbo].[RecordClassificationLink]
--- WHERE ClassificationId=...
)
) AS rec
CROSS JOIN [dbo].[AdamUser] AS usr
LEFT JOIN
(SELECT AccessingUser, RecordId, COUNT(*) AS _count
FROM [dbo].[MaintenanceJobHistory]
WHERE [CreatedOn] > '2016-01-01 00:00:00' AND
[CreatedOn] < '2016-12-01 00:00:00'
GROUP BY AccessingUser, RecordId
) AS hist ON
usr.Name = hist.AccessingUser AND
rec.Id = hist.RecordId
GROUP BY
usr.OrganizationName,
rec.[Id],
rec.[CreationDate],
rec.[LastModified],
rec.[AssetType],
rec.[MasterFilename],
rec.[GameName],
usr.[OrganizationName];... and the plan looks a tiny bit better as well (look for the red
Lazy spool at the bottom of the original plan, that is now gone.)Code Snippets
CREATE INDEX IX_RecordId
ON dbo.RecordClassificationLink (RecordId);
CREATE INDEX IX_MaintenanceJobHistory_ByUser
ON dbo.MaintenanceJobHistory (AccessingUser, RecordId);
CREATE UNIQUE INDEX UIX_AdamUser_OrgnName
ON dbo.AdamUser (OrganizationName, Name);SELECT
rec.[Id] AS RecordId,
MIN(rec.[CreationDate]) AS RecordCreationDate,
MIN(rec.[LastModified]) AS RecordLastModified,
MIN(rec.[AssetType]) AS RecordAssetType,
MIN(rec.[MasterFilename]) AS RecordMasterFilename,
MIN(rec.[GameName]) AS RecordGameName,
usr.[OrganizationName],
SUM(ISNULL(hist._count, 0)) AS TimesDownloaded
FROM
(
SELECT
innerRec.Id,
MIN(innerRec.CreationDate) AS CreationDate,
MIN(innerRec.LastModified) AS LastModified,
MIN(innerRec.AssetType) AS AssetType,
MIN(innerRec.MasterFilename) AS MasterFilename,
MIN(innerRec.GameName) AS GameName
FROM [dbo].[Record] innerRec
INNER JOIN [dbo].[RecordClassificationLink] innerLnk ON innerRec.Id = innerLnk.RecordId
-- WHERE (classification ID is foo or bar)
GROUP BY
innerRec.Id
-- HAVING COUNT(innerLnk.ClassificationId) = (number of specified classifications)
) AS rec
CROSS JOIN [dbo].[AdamUser] AS usr
LEFT JOIN
(SELECT AccessingUser, RecordId, COUNT(*) AS _count
FROM [dbo].[MaintenanceJobHistory]
WHERE [CreatedOn] > '2016-01-01 00:00:00' AND
[CreatedOn] < '2016-12-01 00:00:00'
GROUP BY AccessingUser, RecordId
) AS hist ON
usr.Name = hist.AccessingUser AND
rec.Id = hist.RecordId
GROUP BY
rec.Id, usr.OrganizationName;SELECT
rec.[Id] AS RecordId,
rec.[CreationDate] AS RecordCreationDate,
rec.[LastModified] AS RecordLastModified,
rec.[AssetType] AS RecordAssetType,
rec.[MasterFilename] AS RecordMasterFilename,
rec.[GameName] AS RecordGameName,
usr.[OrganizationName],
SUM(ISNULL(hist._count, 0)) AS TimesDownloaded
FROM
(
SELECT
Id,
CreationDate,
LastModified,
AssetType,
MasterFilename,
GameName
FROM [dbo].[Record]
WHERE Id IN (
SELECT RecordId
FROM [dbo].[RecordClassificationLink]
--- WHERE ClassificationId=...
)
) AS rec
CROSS JOIN [dbo].[AdamUser] AS usr
LEFT JOIN
(SELECT AccessingUser, RecordId, COUNT(*) AS _count
FROM [dbo].[MaintenanceJobHistory]
WHERE [CreatedOn] > '2016-01-01 00:00:00' AND
[CreatedOn] < '2016-12-01 00:00:00'
GROUP BY AccessingUser, RecordId
) AS hist ON
usr.Name = hist.AccessingUser AND
rec.Id = hist.RecordId
GROUP BY
usr.OrganizationName,
rec.[Id],
rec.[CreationDate],
rec.[LastModified],
rec.[AssetType],
rec.[MasterFilename],
rec.[GameName],
usr.[OrganizationName];Context
StackExchange Database Administrators Q#157602, answer score: 3
Revisions (0)
No revisions yet.