HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Most cost efficient way to page through a poorly ordered table?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tableefficientwaypoorlypagethroughcostmostordered

Problem

I have a table that has three columns:
HashUID1, HashUID2, Address_Name (which is a textual email address, and the previous two hash colunms are of some crazy creation to link event participant tables to email addresses. its ugly, it barely works its out of my control. Focus on the address_name index)

It has 78 million rows. Not properly sorted. Regardless, this index is split onto a lot of fast LUN's and performs REALLY fast index seeks.

I need to create a series of queries to extract only 20,000 "rows per page" at a time, but avoid conflicts or dupes. Since there is no identity column, or easily ordered column, is there an easy way to select all, and page through it?

Am I correct in saying that if I do a select * from hugetablewithemails into a temp table, then select through it by row_number that the table remains in memory for the duration of the transaction, which, to me, is an excessive amount of memory resources?
This seems the preferred method of paging. I'd rather page by statistical percentages. :(

There is one index which maintains the address_name email address in order, and is well maintained. For the past week I have been meaning to help this other developer by spending some time on looking into building a proc that spits out ranges based on windowing functions based on statistics (which I am not great at, but this query really interested me) to provide a range of characters 1 through (variable) LEFT LIKE chars of the index, that meets 20,000 rows--But I have not had time to even start the query...

Couple questions:

-
Any suggestions? Not looking for actual code, just some hints or suggestions based on experiences, maybe caveats. I want to avoid additional index scans after the initial scan.

-
Is this the right approach?

-
I'm thinking of breaking the sum of the index of all email addresses, gathering rowcount(*), /20,000, and usinng that as a windowing function to group min/max substring(1,5) values based on percentages of total rowcount

Solution

Essentially, you are asking if you can perform a single ordered scan through the data overall, while making no copies of the data, and returning 'x' disjoint sets of rows from the full set on each call. This is exactly the behaviour of an appropriately-configured API cursor.

For example, using the AdventureWorks table Person.EmailAddress to return sets of 1,000 rows:

DECLARE 
    @cur integer,
    -- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE
    @scrollopt integer = 16 | 8192 | 16384,
    -- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE
    @ccopt integer = 1 | 32768 | 65536, 
    @rowcount integer = 1000,
    @rc integer;

-- Open the cursor and return the first 1,000 rows
EXECUTE @rc = sys.sp_cursoropen
    @cur OUTPUT,
    N'
    SELECT *
    FROM AdventureWorks2012.Person.EmailAddress
        WITH (INDEX([IX_EmailAddress_EmailAddress]))
    ORDER BY EmailAddress;
    ',
    @scrollopt OUTPUT,
    @ccopt OUTPUT,
    @rowcount OUTPUT;

IF @rc <> 16 -- FastForward cursor automatically closed
BEGIN
    -- Name the cursor so we can use CURSOR_STATUS
    EXECUTE sys.sp_cursoroption
        @cur, 
        2, 
        'MyCursorName';

    -- Until the cursor auto-closes
    WHILE CURSOR_STATUS('global', 'MyCursorName') = 1
    BEGIN
        EXECUTE sys.sp_cursorfetch
            @cur,
            2,
            0,
            1000;
    END;
END;


Each fetch operation returns a maximum of 1,000 rows, remembering the position of the scan from the previous call.

Code Snippets

DECLARE 
    @cur integer,
    -- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE
    @scrollopt integer = 16 | 8192 | 16384,
    -- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE
    @ccopt integer = 1 | 32768 | 65536, 
    @rowcount integer = 1000,
    @rc integer;

-- Open the cursor and return the first 1,000 rows
EXECUTE @rc = sys.sp_cursoropen
    @cur OUTPUT,
    N'
    SELECT *
    FROM AdventureWorks2012.Person.EmailAddress
        WITH (INDEX([IX_EmailAddress_EmailAddress]))
    ORDER BY EmailAddress;
    ',
    @scrollopt OUTPUT,
    @ccopt OUTPUT,
    @rowcount OUTPUT;

IF @rc <> 16 -- FastForward cursor automatically closed
BEGIN
    -- Name the cursor so we can use CURSOR_STATUS
    EXECUTE sys.sp_cursoroption
        @cur, 
        2, 
        'MyCursorName';

    -- Until the cursor auto-closes
    WHILE CURSOR_STATUS('global', 'MyCursorName') = 1
    BEGIN
        EXECUTE sys.sp_cursorfetch
            @cur,
            2,
            0,
            1000;
    END;
END;

Context

StackExchange Database Administrators Q#82715, answer score: 17

Revisions (0)

No revisions yet.