HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Is a MERGE with OUTPUT better practice than a conditional INSERT and SELECT?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
insertmergewithpracticethanoutputbetterconditionalandselect

Problem

We often encounter the "If not exists, insert" situation. Dan Guzman's blog has an excellent investigation of how to make this process threadsafe.

I have a basic table that simply catalogs a string to an integer from a SEQUENCE. In a stored procedure I need to either get the integer key for the value if it exists, or INSERT it and then get the resulting value. There is a uniqueness constraint on the dbo.NameLookup.ItemName column so data integrity isn't at risk but I don't want to encounter the exceptions.

It's not an IDENTITY so I can't get SCOPE_IDENTITY and the value could be NULL in certain cases.

In my situation I only have to deal with INSERT safety on the table so I'm trying to decide if it's better practice to use MERGE like this:

SET NOCOUNT, XACT_ABORT ON;

DECLARE @vValueId INT 
DECLARE @inserted AS TABLE (Id INT NOT NULL)

MERGE 
    dbo.NameLookup WITH (HOLDLOCK) AS f 
USING 
    (SELECT @vName AS val WHERE @vName IS NOT NULL AND LEN(@vName) > 0) AS new_item
        ON f.ItemName= new_item.val
WHEN MATCHED THEN
    UPDATE SET @vValueId = f.Id
WHEN NOT MATCHED BY TARGET THEN
    INSERT
      (ItemName)
    VALUES
      (@vName)
OUTPUT inserted.Id AS Id INTO @inserted;
SELECT @vValueId = s.Id FROM @inserted AS s


I could do this witout using MERGE with just a conditional INSERT followed by a SELECT
I think that this second approach is clearer to the reader, but I'm not convinced it's "better" practice

SET NOCOUNT, XACT_ABORT ON;

INSERT INTO 
    dbo.NameLookup (ItemName)
SELECT
    @vName
WHERE
    NOT EXISTS (SELECT * FROM dbo.NameLookup AS t WHERE @vName IS NOT NULL AND LEN(@vName) > 0 AND t.ItemName = @vName)

DECLARE @vValueId int;
SELECT @vValueId = i.Id FROM dbo.NameLookup AS i WHERE i.ItemName = @vName


Or perhaps there is another better way that I haven't considered

I did search and reference other questions. This one: https://stackoverflow.com/questions/5288283/sql-server-insert-if-not-exists-best-pra

Solution

Because you are using a Sequence, you can use the same NEXT VALUE FOR function -- that you already have in a Default Constraint on the Id Primary Key field -- to generate a new Id value ahead of time. Generating the value first means that you don't need to worry about not having SCOPE_IDENTITY, which then means that you don't need either the OUTPUT clause or doing an additional SELECT to get the new value; you will have the value before you do the INSERT, and you don't even need to mess with SET IDENTITY INSERT ON / OFF :-)

So that takes care of part of the overall situation. The other part is handling the concurrency issue of two processes, at the exact same time, not finding an existing row for the exact same string, and proceeding with the INSERT. The concern is about avoiding the Unique Constraint violation that would occur.

One way to handle these types of concurrency issues is to force this particular operation to be single threaded. The way to do that is by using application locks (which work across sessions). While effective, they can be a bit heavy-handed for a situation like this where the frequency of collisions is probably fairly low.

The other way to deal with the collisions is to accept that they will sometimes occur and handle them rather than try to avoid them. Using the TRY...CATCH construct, you can effectively trap a specific error (in this case: "unique constraint violation", Msg 2601) and re-execute the SELECT to get the Id value since we know that it now exists due to being in the CATCH block with that particular error. Other errors can be handled in the typical RAISERROR / RETURN or THROW manner.

Test Setup: Sequence, Table, and Unique Index

USE [tempdb];

CREATE SEQUENCE dbo.MagicNumber
  AS INT
  START WITH 1
  INCREMENT BY 1;

CREATE TABLE dbo.NameLookup
(
  [Id] INT NOT NULL
         CONSTRAINT [PK_NameLookup] PRIMARY KEY CLUSTERED
        CONSTRAINT [DF_NameLookup_Id] DEFAULT (NEXT VALUE FOR dbo.MagicNumber),
  [ItemName] NVARCHAR(50) NOT NULL         
);

CREATE UNIQUE NONCLUSTERED INDEX [UIX_NameLookup_ItemName]
  ON dbo.NameLookup ([ItemName]);
GO


Test Setup: Stored Procedure

CREATE PROCEDURE dbo.GetOrInsertName
(
  @SomeName NVARCHAR(50),
  @ID INT OUTPUT,
  @TestRaceCondition BIT = 0
)
AS
SET NOCOUNT ON;

BEGIN TRY
  SELECT @ID = nl.[Id]
  FROM   dbo.NameLookup nl
  WHERE  nl.[ItemName] = @SomeName
  AND    @TestRaceCondition = 0;

  IF (@ID IS NULL)
  BEGIN
    SET @ID = NEXT VALUE FOR dbo.MagicNumber;

    INSERT INTO dbo.NameLookup ([Id], [ItemName])
    VALUES (@ID, @SomeName);
  END;
END TRY
BEGIN CATCH
  IF (ERROR_NUMBER() = 2601) -- "Cannot insert duplicate key row in object"
  BEGIN
    SELECT @ID = nl.[Id]
    FROM   dbo.NameLookup nl
    WHERE  nl.[ItemName] = @SomeName;
  END;
  ELSE
  BEGIN
    ;THROW; -- SQL Server 2012 or newer
    /*
    DECLARE @ErrorNumber INT = ERROR_NUMBER(),
            @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE();

    RAISERROR(N'Msg %d: %s', 16, 1, @ErrorNumber, @ErrorMessage);
    RETURN;
    */
  END;

END CATCH;
GO


The Test

DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
  @SomeName = N'test1',
  @ID = @ItemID OUTPUT;
SELECT @ItemID AS [ItemID];
GO

DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
  @SomeName = N'test1',
  @ID = @ItemID OUTPUT,
  @TestRaceCondition = 1;
SELECT @ItemID AS [ItemID];
GO


Question from O.P.


Why is this better than the MERGE? Won't I get the same functionality without the TRY by using the WHERE NOT EXISTS clause?

MERGE has various "issues" (several references are linked in @SqlZim's answer so no need to duplicate that info here). And, there is no additional locking in this approach (less contention), so it should be better on concurrency. In this approach you'll never get a Unique Constraint violation, all without any HOLDLOCK, etc. It's pretty much guaranteed to work.

The reasoning behind this approach is:

  • If you have enough executions of this procedure such that you need to worry about collisions, then you don't want to:



  • take any more steps than are necessary



  • hold locks on any resources for longer than necessary



  • Since collisions can only happen upon new entries (new entries submitted at the exact same time), the frequency of falling into the CATCH block in the first place will be pretty low. It makes more sense to optimize the code that will run 99% of the time instead of the code that will run 1% of the time (unless there is no cost to optimizing both, but that is not the case here).



Comment from @SqlZim's answer (emphasis added)


I personally prefer to try and tailor a solution to avoid doing that when possible. In this case, I don't feel that using the locks from serializable is a heavy handed approach, and I would be confident it would handle high concurrency well.

I would agree with this first sentence if it were amended to state "and _when prudent". Just because something is technically possible does

Code Snippets

USE [tempdb];

CREATE SEQUENCE dbo.MagicNumber
  AS INT
  START WITH 1
  INCREMENT BY 1;

CREATE TABLE dbo.NameLookup
(
  [Id] INT NOT NULL
         CONSTRAINT [PK_NameLookup] PRIMARY KEY CLUSTERED
        CONSTRAINT [DF_NameLookup_Id] DEFAULT (NEXT VALUE FOR dbo.MagicNumber),
  [ItemName] NVARCHAR(50) NOT NULL         
);

CREATE UNIQUE NONCLUSTERED INDEX [UIX_NameLookup_ItemName]
  ON dbo.NameLookup ([ItemName]);
GO
CREATE PROCEDURE dbo.GetOrInsertName
(
  @SomeName NVARCHAR(50),
  @ID INT OUTPUT,
  @TestRaceCondition BIT = 0
)
AS
SET NOCOUNT ON;

BEGIN TRY
  SELECT @ID = nl.[Id]
  FROM   dbo.NameLookup nl
  WHERE  nl.[ItemName] = @SomeName
  AND    @TestRaceCondition = 0;

  IF (@ID IS NULL)
  BEGIN
    SET @ID = NEXT VALUE FOR dbo.MagicNumber;

    INSERT INTO dbo.NameLookup ([Id], [ItemName])
    VALUES (@ID, @SomeName);
  END;
END TRY
BEGIN CATCH
  IF (ERROR_NUMBER() = 2601) -- "Cannot insert duplicate key row in object"
  BEGIN
    SELECT @ID = nl.[Id]
    FROM   dbo.NameLookup nl
    WHERE  nl.[ItemName] = @SomeName;
  END;
  ELSE
  BEGIN
    ;THROW; -- SQL Server 2012 or newer
    /*
    DECLARE @ErrorNumber INT = ERROR_NUMBER(),
            @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE();

    RAISERROR(N'Msg %d: %s', 16, 1, @ErrorNumber, @ErrorMessage);
    RETURN;
    */
  END;

END CATCH;
GO
DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
  @SomeName = N'test1',
  @ID = @ItemID OUTPUT;
SELECT @ItemID AS [ItemID];
GO

DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
  @SomeName = N'test1',
  @ID = @ItemID OUTPUT,
  @TestRaceCondition = 1;
SELECT @ItemID AS [ItemID];
GO

Context

StackExchange Database Administrators Q#158092, answer score: 9

Revisions (0)

No revisions yet.