HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Custom ID Numbering - fast way on insert?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fastinsertwaycustomnumbering

Problem

As start: this is for an ETL stored procedure that is serialized, so paralellism is not of concern.

I need to assign custom ID numbers for markers during loads. I can not use an identity field, because the ID numbers are to be unique by a "bucket number" - essentially another numbered field.

I currently use the following code:

DECLARE @idRunner smallint
SELECT @idRunner = ISNULL(MAX([Id]),0)
    FROM sim.[Variable]
    WHERE [BucketRef] = @simBucketNo

DECLARE variable_cursor CURSOR FOR
    SELECT DISTINCT p.[Variable]
        FROM simstg.[Parameter] p
        LEFT OUTER JOIN sim.[Variable] v ON (p.[Variable] = v.[Code])
        WHERE p.[BucketRef] = @stgBucketNo
            AND v.BucketRef = @simBucketNo
            AND v.Code IS NULL
OPEN variable_cursor
DECLARE @variable VARCHAR(64)

FETCH NEXT FROM variable_cursor INTO @variable
WHILE @@FETCH_STATUS = 0
BEGIN
    SET @idRunner = @idRunner + 1
    INSERT INTO sim.[Variable] ([BucketRef], [VariableNo], [Code])
        VALUES (@simBucketNo, @idRunner, @variable)

    FETCH NEXT FROM variable_cursor INTO @variable
END

CLOSE variable_cursor
DEALLOCATE variable_cursor


and I Do not like it. It uses cursors which I prefer to avoid.
(Sidenote: this code is untested)

Is there a way to do this more efficient, without a cursor? IR numbers must increase from the highest used when new elements are added. Multiple buckets exist (by bucket no) and have their own numbering.

I am always only going to process data for one bucket at a time (one bucket in sim, one in simstg).

Solution

Using a single statement and ROW_NUMBER(). With proper indexes, it will probably be more efficient than cursors.

Note that I moved the p.[BucketRef] = @stgBucketNo from the WHERE to the ON clause. The way you had it, made the LEFT JOIN work as an INNER JOIN.

WITH 
  starting (Id) AS
    ( SELECT ISNULL( ( SELECT TOP (1) v.Id 
                       FROM sim.[Variable] AS v 
                       WHERE v.BucketRef = @simBucketNo 
                       ORDER BY v.Id DESC
                     ), 0) 
    ), 
  variables AS
    ( SELECT  
          p.[Variable],
          idRunner = s.Id + ROW_NUMBER() OVER (ORDER BY p.[Variable])
        FROM starting AS s 
          CROSS JOIN simstg.[Parameter] AS p
          LEFT OUTER JOIN sim.[Variable] AS v 
            ON  p.[Variable] = v.[Code] 
            AND p.BucketRef = @stgBucketNo 
        WHERE v.BucketRef = @simBucketNo
          AND v.Code IS NULL
        GROUP BY p.[Variable]
    )
INSERT INTO sim.[Variable] 
    ([BucketRef], [VariableNo], [Code])
SELECT @simBucketNo, v.idRunner, v.[Variable]
FROM variables AS v ;


The common table subqueries (WITH) are there only for clarity - and to resemble the original code. You can easily rewrite the statement without them:

INSERT INTO sim.[Variable] 
    ([BucketRef], [VariableNo], [Code])
SELECT 
    @simBucketNo,
    idRunner = ISNULL( ( SELECT TOP (1) v.Id 
                         FROM sim.[Variable] AS v 
                         WHERE v.BucketRef = @simBucketNo 
                         ORDER BY v.Id DESC
                       ), 0) 
             + ROW_NUMBER() OVER (ORDER BY p.[Variable])
    p.[Variable],
FROM simstg.[Parameter] AS p
    LEFT OUTER JOIN sim.[Variable] AS v 
        ON  p.[Variable] = v.[Code] 
        AND p.BucketRef = @stgBucketNo 
WHERE v.BucketRef = @simBucketNo
  AND v.Code IS NULL
GROUP BY p.[Variable] ;

Code Snippets

WITH 
  starting (Id) AS
    ( SELECT ISNULL( ( SELECT TOP (1) v.Id 
                       FROM sim.[Variable] AS v 
                       WHERE v.BucketRef = @simBucketNo 
                       ORDER BY v.Id DESC
                     ), 0) 
    ), 
  variables AS
    ( SELECT  
          p.[Variable],
          idRunner = s.Id + ROW_NUMBER() OVER (ORDER BY p.[Variable])
        FROM starting AS s 
          CROSS JOIN simstg.[Parameter] AS p
          LEFT OUTER JOIN sim.[Variable] AS v 
            ON  p.[Variable] = v.[Code] 
            AND p.BucketRef = @stgBucketNo 
        WHERE v.BucketRef = @simBucketNo
          AND v.Code IS NULL
        GROUP BY p.[Variable]
    )
INSERT INTO sim.[Variable] 
    ([BucketRef], [VariableNo], [Code])
SELECT @simBucketNo, v.idRunner, v.[Variable]
FROM variables AS v ;
INSERT INTO sim.[Variable] 
    ([BucketRef], [VariableNo], [Code])
SELECT 
    @simBucketNo,
    idRunner = ISNULL( ( SELECT TOP (1) v.Id 
                         FROM sim.[Variable] AS v 
                         WHERE v.BucketRef = @simBucketNo 
                         ORDER BY v.Id DESC
                       ), 0) 
             + ROW_NUMBER() OVER (ORDER BY p.[Variable])
    p.[Variable],
FROM simstg.[Parameter] AS p
    LEFT OUTER JOIN sim.[Variable] AS v 
        ON  p.[Variable] = v.[Code] 
        AND p.BucketRef = @stgBucketNo 
WHERE v.BucketRef = @simBucketNo
  AND v.Code IS NULL
GROUP BY p.[Variable] ;

Context

StackExchange Database Administrators Q#134741, answer score: 8

Revisions (0)

No revisions yet.