HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query optimization issue on view with RANK on DateTime for most recent rows

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsrankrecentwithqueryissueviewoptimizationformost

Problem

I'm having a problem with a view, that is designed to store sync data, for an iOS app, that syncs data back and forth to a web service. The XML fields are used to store both structured data for transposing into PDF documents, and iOS objects. There are frequent writes and reads to this table.

The table has around 600K records, and the data usage is quite high due to two large XML fields. The database table looks like this:

CREATE TABLE [dbo].[SyncData](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [CustomerId] [int] NOT NULL,
    [CategoryId] [int] NOT NULL,
    [SyncXml] [xml] NOT NULL,
    [SyncDump] [xml] NULL,
    [DateCreated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateCreated]  DEFAULT (getdate()),
    [DateUpdated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateUpdated]  DEFAULT (getdate()),
    [DELETED_FLAG] [bit] NOT NULL CONSTRAINT [DF_SyncData_DELETED_FLAG]  DEFAULT ((0)),
 CONSTRAINT [PK_SyncData] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO


The CustomerId and CategoryId columns have foreign keys.

The web service returns the most recent sync data, organized by category, for a specific customer:

CREATE VIEW [dbo].[vw_SyncDataRecent]
AS
    WITH Ranked AS (
        SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (
                PARTITION BY Id, CustomerId 
                ORDER BY DateCreated DESC) AS rn
        FROM SyncData AS cb
    )
    SELECT Id, CustomerId, CategoryId FROM Ranked 
    WHERE rn = 1
GO


The table has a couple of indexes on it already, which are possibly leftovers (this database has been in production since 2004 and has gone through lots of feature additions), so I'm not sure if they are even needed anymore:

```
CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId] ON [dbo].[SyncData]
(

Solution

My initial thoughts are that the view itself is the major issue, in
that it is ranking over the entire data set.

That is exactly it, the window function is applied to the whole dataset to then be filtered by CustomerId and rn=1. See my previous answer here for more information.

With the CustomerId filter predicate applied only when the data gets to the filter operator:

You could add OPTION(RECOMPILE) to the select query with the filter to see the CustomerId at runtime and filter before passing the data through the window function

SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528
OPTION(RECOMPILE);


Other methods as noted in my other answer are not using the view, using a tvf instead, not using a window function to do the filtering, stored procedures, ....

TVF example

CREATE FUNCTION dbo.[Fnc_SyncDataRecent]
(
    @P1 INT
)  
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
       WITH Ranked AS  (
    SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (
                PARTITION BY Id, CustomerId 
                ORDER BY DateCreated DESC) AS rn
        FROM dbo.SyncData AS cb
    )
    SELECT Id, CustomerId, CategoryId 
    FROM Ranked 
    WHERE rn = 1 and CustomerId = @P1

SELECT * FROM dbo.[Fnc_SyncDataRecent](600528);


I would also add this index:

CREATE INDEX IX_CustomerID3
ON dbo.SyncData([CustomerId])
INCLUDE(CategoryId,ID,DateCreated);


EDIT
Your stored procedure

CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
    WITH Ranked AS (
        SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
        FROM SyncData AS cb
        WHERE CustomerId = @CustomerId
        AND DELETED_FLAG = 0
    )
    SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO

EXEC [dbo].[_SyncData_Recent] 600528;


Would also work to apply earlier filtering, but I would add this index:

CREATE INDEX IX_CustomerID_CategoryId_DateCreated_
ON dbo.SyncData([CustomerId],CategoryId,DateCreated DESC)
INCLUDE(DELETED_FLAG,ID)
WHERE DELETED_FLAG = 0;


to remove the sort operator.

And the last filter, WHERE rn = 1 AND CustomerId = @CustomerId could just be WHERE rn = 1.

Code Snippets

SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528
OPTION(RECOMPILE);
CREATE FUNCTION dbo.[Fnc_SyncDataRecent]
(
    @P1 INT
)  
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
       WITH Ranked AS  (
    SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (
                PARTITION BY Id, CustomerId 
                ORDER BY DateCreated DESC) AS rn
        FROM dbo.SyncData AS cb
    )
    SELECT Id, CustomerId, CategoryId 
    FROM Ranked 
    WHERE rn = 1 and CustomerId = @P1



SELECT * FROM dbo.[Fnc_SyncDataRecent](600528);
CREATE INDEX IX_CustomerID3
ON dbo.SyncData([CustomerId])
INCLUDE(CategoryId,ID,DateCreated);
CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
    WITH Ranked AS (
        SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
        FROM SyncData AS cb
        WHERE CustomerId = @CustomerId
        AND DELETED_FLAG = 0
    )
    SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO

EXEC [dbo].[_SyncData_Recent] 600528;
CREATE INDEX IX_CustomerID_CategoryId_DateCreated_
ON dbo.SyncData([CustomerId],CategoryId,DateCreated DESC)
INCLUDE(DELETED_FLAG,ID)
WHERE DELETED_FLAG = 0;

Context

StackExchange Database Administrators Q#260706, answer score: 3

Revisions (0)

No revisions yet.