HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL Server table query with pagination performance tuning, understand the current solution

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
paginationthesqlwithquerytuningunderstandperformancecurrentserver

Problem

as stated in the title i start a performance tuning of a table query with pagination generated by a legacy program that use Linq To SQL as ORM.

I have found this resource in which is highly recommended to sort table before pagination :
https://rimdev.io/optimizing-linq-sql-skip-take/

So i have follow the suggestion provided and experimented an huge difference. It's clear to me that is somewhat related on how row_nummber is calculated but is not clear to me exactly what happened and why there is so much difference between the two queries.

Original slow query (dataset of ~7K elements, take ~3s) :

```
SELECT [t7].[ID], [t7].[ID_BRAND], [t7].[CODE], [t7].[CODFOR], [t7].[COD_ALT01], [t7].[COD_ALT02], [t7].[COD_ALT03], [t7].[ID_UOM], [t7].[IS_ACTIVE], [t7].[_ATTRIBUTES] AS [_ATTRIBUTES], [t7].[_DOCUMENTS] AS [_DOCUMENTS], [t7].[_SEO] AS [_SEO], [t7].[_TRANSLATIONS] AS [_TRANSLATIONS], [t7].[_TAGS] AS [_TAGS], [t7].[_NOTES] AS [_NOTES], [t7].[_METADATA] AS [_METADATA], [t7].[IS_B2B], [t7].[IS_B2C], [t7].[IS_PROMO], [t7].[IS_NEWS], [t7].[CAN_BE_RETURNED], [t7].[IS_SHIPPABLE], [t7].[HAS_SHIPPING_COSTS], [t7].[IS_PURCHEASABLE], [t7].[test], [t7].[ID2], [t7].[CODE2], [t7].[BUSINESS_NAME], [t7].[NAME], [t7].[PHONE_01], [t7].[PHONE_02], [t7].[PHONE_03], [t7].[FAX_01], [t7].[FAX_02], [t7].[COUNTRY_01], [t7].[CITY_01], [t7].[ADDRESS_01], [t7].[COUNTRY_02], [t7].[CITY_02], [t7].[ADDRESS_02], [t7].[EMAIL_01], [t7].[EMAIL_02], [t7].[PEC], [t7].[SITE_01], [t7].[SITE_02], [t7].[SITE_03], [t7].[SITE_04], [t7].[VAT_NUMBER], [t7].[SORT], [t7].[GROUPID_01], [t7].[IS_GROUPLEADER_01], [t7].[GROUPID_02], [t7].[IS_GROUPLEADER_02],[t7].[IS_ACTIVE2], [t7].[[_DOCUMENTS]]2] AS [_DOCUMENTS2], [t7].[[_SEO]]2] AS [_SEO2], [t7].[[_METADATA]]2] AS [_METADATA2], [t7].[test2], [t7].[ID3], [t7].[CODE3], [t7].[[_TRANSLATIONS]]2] AS [_TRANSLATIONS2], [t7].[[_METADATA]]3] AS [_METADATA3], [t7].[test3], [t7].[ID4], [t7].[ID_LINE], [t7].[ID_GROUP], [t7].[ID_CLASS], [t7].[ID_FAM], [t7].[ID_ARTICLE]
FROM (

Solution

das spool

The main difference between the two queries is the presence of an Eager Index Spool.

From the article:

Eager index spools can occur on the inner side of Nested Loops joins
to reduce the amount of work that needs to be done there, by creating
a more opportune index for the loop to work off of.

That all sounds very good, but there are some problems:

  • The index gets created single-threaded



  • The way data is loaded into the spool is very inefficient



  • The spool is disposed of when the query is finished, It’ll get built over and over again on later executions



  • There’s no missing index request for the spool anywhere



But in your case, since the spool is quite small, it works in your favor.
slowpoke

In the slow query, you have a nested loops join against tbl_src_ArticlesCategories which executes ~7k times, but doesn't have a useful index, so the entire table is scanned for each execution.

The scan:

The details:

You end up reading ~53 million rows in total when all is said and done, because you scan ~736k rows ~7k times.
fastpoke

In the fast plan, you get this instead:

The scan and spool (with seek):

The details

The optimizer for this plan decided to create a good index for you, so it has a more suitable structure to use to locate matching rows in ID_ARTICLE.

You do ~7k seeks, which is much more efficient given the circumstances.
equalizer

You could potentially get equal performance from both queries by adding this index:

CREATE /*UNIQUE?*/ INDEX spool_b_gone
    ON [dbo].[tbl_src_ArticlesCategories]
(
    [ID_ARTICLE]
)
INCLUDE
(
    [ID_LINE],
    [ID_GROUP], 
    [ID_CLASS], 
    [ID_FAM]
);


Though sometimes the optimizer is foolish and may decide to spool even when you have the right index in place.
differences?

The immediate difference that I see is that in the slower execution, you generate the row number over a ton more columns across different tables:

But I'm a bit short on time, so there may be other things contributing to the choice between spool/no spool.

Code Snippets

CREATE /*UNIQUE?*/ INDEX spool_b_gone
    ON [dbo].[tbl_src_ArticlesCategories]
(
    [ID_ARTICLE]
)
INCLUDE
(
    [ID_LINE],
    [ID_GROUP], 
    [ID_CLASS], 
    [ID_FAM]
);

Context

StackExchange Database Administrators Q#305884, answer score: 9

Revisions (0)

No revisions yet.