patternsqlMinor
Custom ID Numbering - fast way on insert?
Viewed 0 times
fastinsertwaycustomnumbering
Problem
As start: this is for an ETL stored procedure that is serialized, so paralellism is not of concern.
I need to assign custom ID numbers for markers during loads. I can not use an identity field, because the ID numbers are to be unique by a "bucket number" - essentially another numbered field.
I currently use the following code:
and I Do not like it. It uses cursors which I prefer to avoid.
(Sidenote: this code is untested)
Is there a way to do this more efficient, without a cursor? IR numbers must increase from the highest used when new elements are added. Multiple buckets exist (by bucket no) and have their own numbering.
I am always only going to process data for one bucket at a time (one bucket in sim, one in simstg).
I need to assign custom ID numbers for markers during loads. I can not use an identity field, because the ID numbers are to be unique by a "bucket number" - essentially another numbered field.
I currently use the following code:
DECLARE @idRunner smallint
SELECT @idRunner = ISNULL(MAX([Id]),0)
FROM sim.[Variable]
WHERE [BucketRef] = @simBucketNo
DECLARE variable_cursor CURSOR FOR
SELECT DISTINCT p.[Variable]
FROM simstg.[Parameter] p
LEFT OUTER JOIN sim.[Variable] v ON (p.[Variable] = v.[Code])
WHERE p.[BucketRef] = @stgBucketNo
AND v.BucketRef = @simBucketNo
AND v.Code IS NULL
OPEN variable_cursor
DECLARE @variable VARCHAR(64)
FETCH NEXT FROM variable_cursor INTO @variable
WHILE @@FETCH_STATUS = 0
BEGIN
SET @idRunner = @idRunner + 1
INSERT INTO sim.[Variable] ([BucketRef], [VariableNo], [Code])
VALUES (@simBucketNo, @idRunner, @variable)
FETCH NEXT FROM variable_cursor INTO @variable
END
CLOSE variable_cursor
DEALLOCATE variable_cursorand I Do not like it. It uses cursors which I prefer to avoid.
(Sidenote: this code is untested)
Is there a way to do this more efficient, without a cursor? IR numbers must increase from the highest used when new elements are added. Multiple buckets exist (by bucket no) and have their own numbering.
I am always only going to process data for one bucket at a time (one bucket in sim, one in simstg).
Solution
Using a single statement and
Note that I moved the
The common table subqueries (
ROW_NUMBER(). With proper indexes, it will probably be more efficient than cursors.Note that I moved the
p.[BucketRef] = @stgBucketNo from the WHERE to the ON clause. The way you had it, made the LEFT JOIN work as an INNER JOIN.WITH
starting (Id) AS
( SELECT ISNULL( ( SELECT TOP (1) v.Id
FROM sim.[Variable] AS v
WHERE v.BucketRef = @simBucketNo
ORDER BY v.Id DESC
), 0)
),
variables AS
( SELECT
p.[Variable],
idRunner = s.Id + ROW_NUMBER() OVER (ORDER BY p.[Variable])
FROM starting AS s
CROSS JOIN simstg.[Parameter] AS p
LEFT OUTER JOIN sim.[Variable] AS v
ON p.[Variable] = v.[Code]
AND p.BucketRef = @stgBucketNo
WHERE v.BucketRef = @simBucketNo
AND v.Code IS NULL
GROUP BY p.[Variable]
)
INSERT INTO sim.[Variable]
([BucketRef], [VariableNo], [Code])
SELECT @simBucketNo, v.idRunner, v.[Variable]
FROM variables AS v ;The common table subqueries (
WITH) are there only for clarity - and to resemble the original code. You can easily rewrite the statement without them:INSERT INTO sim.[Variable]
([BucketRef], [VariableNo], [Code])
SELECT
@simBucketNo,
idRunner = ISNULL( ( SELECT TOP (1) v.Id
FROM sim.[Variable] AS v
WHERE v.BucketRef = @simBucketNo
ORDER BY v.Id DESC
), 0)
+ ROW_NUMBER() OVER (ORDER BY p.[Variable])
p.[Variable],
FROM simstg.[Parameter] AS p
LEFT OUTER JOIN sim.[Variable] AS v
ON p.[Variable] = v.[Code]
AND p.BucketRef = @stgBucketNo
WHERE v.BucketRef = @simBucketNo
AND v.Code IS NULL
GROUP BY p.[Variable] ;Code Snippets
WITH
starting (Id) AS
( SELECT ISNULL( ( SELECT TOP (1) v.Id
FROM sim.[Variable] AS v
WHERE v.BucketRef = @simBucketNo
ORDER BY v.Id DESC
), 0)
),
variables AS
( SELECT
p.[Variable],
idRunner = s.Id + ROW_NUMBER() OVER (ORDER BY p.[Variable])
FROM starting AS s
CROSS JOIN simstg.[Parameter] AS p
LEFT OUTER JOIN sim.[Variable] AS v
ON p.[Variable] = v.[Code]
AND p.BucketRef = @stgBucketNo
WHERE v.BucketRef = @simBucketNo
AND v.Code IS NULL
GROUP BY p.[Variable]
)
INSERT INTO sim.[Variable]
([BucketRef], [VariableNo], [Code])
SELECT @simBucketNo, v.idRunner, v.[Variable]
FROM variables AS v ;INSERT INTO sim.[Variable]
([BucketRef], [VariableNo], [Code])
SELECT
@simBucketNo,
idRunner = ISNULL( ( SELECT TOP (1) v.Id
FROM sim.[Variable] AS v
WHERE v.BucketRef = @simBucketNo
ORDER BY v.Id DESC
), 0)
+ ROW_NUMBER() OVER (ORDER BY p.[Variable])
p.[Variable],
FROM simstg.[Parameter] AS p
LEFT OUTER JOIN sim.[Variable] AS v
ON p.[Variable] = v.[Code]
AND p.BucketRef = @stgBucketNo
WHERE v.BucketRef = @simBucketNo
AND v.Code IS NULL
GROUP BY p.[Variable] ;Context
StackExchange Database Administrators Q#134741, answer score: 8
Revisions (0)
No revisions yet.