patternsqlMinor
Filtering data ordered by rowversion
Viewed 0 times
rowversiondataorderedfiltering
Problem
I have a SQL table of data with the following structure :
The number of distinct Ids ranges from 3000 to 50000.
The size of the table varies up to over a billion rows.
One Id can cover between a few rows up to 5% of the table.
The single most executed query on this table is :
I now have to implement incremental retrieval of data on a subset of Ids, including updates.
I then used a request scheme in which the caller provide a specific rowversion, retrieve a block of data and use the maximum rowversion value of the returned data for the subsequent call.
I have written this procedure :
If it is close to the end of data (ordered by RV) the query is instantaneous and if not the query can take up to minutes (nev
CREATE TABLE Data(
Id uniqueidentifier NOT NULL,
Date datetime NOT NULL,
Value decimal(20, 10) NULL,
RV timestamp NOT NULL,
CONSTRAINT PK_Data PRIMARY KEY CLUSTERED (Id, Date)
)
The number of distinct Ids ranges from 3000 to 50000.
The size of the table varies up to over a billion rows.
One Id can cover between a few rows up to 5% of the table.
The single most executed query on this table is :
SELECT Id, Date, Value, RV
FROM Data
WHERE Id = @Id
AND Date Between @StartDate AND @StopDate
I now have to implement incremental retrieval of data on a subset of Ids, including updates.
I then used a request scheme in which the caller provide a specific rowversion, retrieve a block of data and use the maximum rowversion value of the returned data for the subsequent call.
I have written this procedure :
CREATE TYPE guid_list_tbltype AS TABLE (Id uniqueidentifier not null primary key)
CREATE PROCEDURE GetData
@Ids guid_list_tbltype READONLY,
@Cursor rowversion,
@MaxRows int
AS
BEGIN
SELECT A.*
FROM (
SELECT
Data.Id,
Date,
Value,
RV,
ROW_NUMBER() OVER (ORDER BY RV) AS RN
FROM Data
inner join (SELECT Id FROM @Ids) Ids ON Ids.Id = Data.Id
WHERE RV > @Cursor
) A
WHERE RN
Where @MaxRows will range between 500,000 and 2,000,000 depending on how chunked the client will want his data.
I have tried different approaches :
- Indexing on (Id, RV) :
CREATE NONCLUSTERED INDEX IDX_IDRV ON Data(Id, RV) INCLUDE(Date, Value);
Using the index, the query seek the rows where RV = @Cursor for each Id in @Ids, read the following rows then merge the result and sort.
Efficiency then depends on the relative position of @Cursor` value.If it is close to the end of data (ordered by RV) the query is instantaneous and if not the query can take up to minutes (nev
Solution
One solution is for the client application to remember the maximum
The query in the procedure can then be rewritten to use the
You should get a post-execution query plan like this (estimated plan will be serial):
rowversion per ID. The user-defined table type would change to:CREATE TYPE
dbo.guid_list_tbltype
AS TABLE
(
Id uniqueidentifier PRIMARY KEY,
LastRV rowversion NOT NULL
);The query in the procedure can then be rewritten to use the
APPLY pattern (see my SQLServerCentral articles part 1 and part 2 - free login required). The key to good performance here is the ORDER BY - it avoids unordered pre-fetching on the nested loops join. The RECOMPILE is necessary to allow the optimizer to see the cardinality of the table variable at compilation time (probably resulting in a desirable parallel plan).ALTER PROCEDURE dbo.GetData
@IDs guid_list_tbltype READONLY,
@MaxRows bigint
AS
BEGIN
SELECT TOP (@MaxRows)
d.Id,
d.[Date],
d.Value,
d.RV
FROM @Ids AS i
CROSS APPLY
(
SELECT
d.*
FROM dbo.Data AS d
WHERE
d.Id = i.Id
AND d.RV > i.LastRV
) AS d
ORDER BY
i.Id,
d.RV
OPTION (RECOMPILE);
END;You should get a post-execution query plan like this (estimated plan will be serial):
Code Snippets
CREATE TYPE
dbo.guid_list_tbltype
AS TABLE
(
Id uniqueidentifier PRIMARY KEY,
LastRV rowversion NOT NULL
);ALTER PROCEDURE dbo.GetData
@IDs guid_list_tbltype READONLY,
@MaxRows bigint
AS
BEGIN
SELECT TOP (@MaxRows)
d.Id,
d.[Date],
d.Value,
d.RV
FROM @Ids AS i
CROSS APPLY
(
SELECT
d.*
FROM dbo.Data AS d
WHERE
d.Id = i.Id
AND d.RV > i.LastRV
) AS d
ORDER BY
i.Id,
d.RV
OPTION (RECOMPILE);
END;Context
StackExchange Database Administrators Q#22129, answer score: 5
Revisions (0)
No revisions yet.