HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMajor

Why does adding a TOP 1 dramatically worsen performance?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whytopaddingperformancedoesworsendramatically

Problem

I have a fairly simple query

SELECT TOP 1 dc.DOCUMENT_ID,
        dc.COPIES,
        dc.REQUESTOR,
        dc.D_ID,
        cj.FILE_NUMBER
FROM DOCUMENT_QUEUE dc
JOIN CORRESPONDENCE_JOURNAL cj
    ON dc.DOCUMENT_ID = cj.DOCUMENT_ID
WHERE dc.QUEUE_DATE <= GETDATE()
  AND dc.PRINT_LOCATION = 2
ORDER BY cj.FILE_NUMBER


That is giving me horrible performance (like never bothered to wait for it to finish). The query plan looks like this:

However if I remove the TOP 1 I get a plan that looks like this and it runs in 1-2 seconds:

Correct PK & indexing below.

The fact that the TOP 1 changed the query plan doesn't surprise me, I'm just a bit surprised that it makes it so much worse.

Note: I've read the results from this post and understand the concept of a Row Goal etc. What I'm curious about is how I can go about changing the query so that it uses the better plan. Currently I'm dumping the data into a temp table then pulling the first row off of it. I'm wondering if there is a better method.

Edit For people reading this after the fact here are a few extra pieces of information.

  • Document_Queue - PK/CI is D_ID and it has ~5k rows.



  • Correspondence_Journal - PK/CI is FILE_NUMBER, CORRESPONDENCE_ID and it has ~1.4 mil rows.



When I started there were no other indexes. I ended up with one on Correspondence_Journal (Document_Id, File_Number)

Solution

Since you get the correct plan with the ORDER BY, maybe you could just roll your own TOP operator?

SELECT DOCUMENT_ID, COPIES, REQUESTOR, D_ID, FILE_NUMBER
FROM (
    SELECT dc.DOCUMENT_ID,
           dc.COPIES,
           dc.REQUESTOR,
           dc.D_ID,
           cj.FILE_NUMBER,
           ROW_NUMBER() OVER (ORDER BY cj.FILE_NUMBER) AS _rownum
    FROM DOCUMENT_QUEUE dc
    INNER JOIN CORRESPONDENCE_JOURNAL cj
        ON dc.DOCUMENT_ID = cj.DOCUMENT_ID
    WHERE dc.QUEUE_DATE <= GETDATE()
      AND dc.PRINT_LOCATION = 2
) AS sub
WHERE _rownum=1;


In my mind, the query plan for the ROW_NUMBER() above should be the same as if you had an ORDER BY. The query plan should now have a Segment, Sequence Project and finally a Filter operator, the rest should look just like your good plan.

Code Snippets

SELECT DOCUMENT_ID, COPIES, REQUESTOR, D_ID, FILE_NUMBER
FROM (
    SELECT dc.DOCUMENT_ID,
           dc.COPIES,
           dc.REQUESTOR,
           dc.D_ID,
           cj.FILE_NUMBER,
           ROW_NUMBER() OVER (ORDER BY cj.FILE_NUMBER) AS _rownum
    FROM DOCUMENT_QUEUE dc
    INNER JOIN CORRESPONDENCE_JOURNAL cj
        ON dc.DOCUMENT_ID = cj.DOCUMENT_ID
    WHERE dc.QUEUE_DATE <= GETDATE()
      AND dc.PRINT_LOCATION = 2
) AS sub
WHERE _rownum=1;

Context

StackExchange Database Administrators Q#127639, answer score: 30

Revisions (0)

No revisions yet.