HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

MS SQL Server - Table with Queue logic - How can I be sure that parallel tasks will not retrieve the same ID?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
canthesamesqlwithlogicsureretrievetasksthat

Problem

I have a table, sys_QueueJob, storing Queue logic data.

I thought that having an update with a return would be enough...
However, now I am not sure if this is 100% safe.

How can I be sure that no matter an many parallel requests will not return the same ID?

UPDATE sys_QueueJob
SET ExecutionStartedOn = GETDATE()
OUTPUT DELETED.Id as Result
WHERE Id = (select top 1 x.Id
            from sys_QueueJob x with (rowlock, updlock, readpast)
                        where x.ExecutionFinishedOn is null
                            AND (
                                x.ExecutionStartedOn is null
                                OR x.ExecutionStartedOn < DATEADD(HOUR, -1, GETDATE())
                               )
                        order by x.CreatedOn asc)

Solution

queueing

Like I talk about in this post, a reliable way to process queues is to use a query like this:

WITH 
    q AS
(
    SELECT TOP (1) 
        x.* 
    FROM sys_QueueJob x WITH (ROWLOCK, UPDLOCK, READPAST)
    WHERE x.ExecutionFinishedOn IS NULL
    AND 
    (
          x.ExecutionStartedOn IS NULL 
       OR x.ExecutionStartedOn < DATEADD(HOUR, -1, GETDATE())
    )
    ORDER BY 
        x.CreatedOn ASC
        /*You may need to also order by Id if CreatedOn isn't unique*/
)
UPDATE 
    q
SET 
    ExecutionStartedOn = GETDATE()
OUTPUT 
    Deleted.Id as Result;


Of course, the bigger challenge for most queue queries is indexing them to make work easy to distribute. Usually, a filtered index that excludes finished work (ExecutionFinishedOn), and keys on filtering/sorting elements is sufficient (ExecutionStartedOn, CreatedOn).

In your case, you're looking both for items that haven't been started, and items that were started over an hour ago. It may make more sense to split these into two "workers" that look for each disposition independently, or adding some logic to go look for rows that started over an hour ago if no rows are found that haven't been started yet.
comparing techniques

The reason I've used a common table expression rather than an update with a subquery is to avoid a query plan with multiple locking calls to the table.

Take a look at this example, which uses a small helper view called dbo.WhatsUpLocks to summarize locks held by a session. I'm also using Extended Events to observe locks acquired and released on my table and related objects (indexes, default constraints, etc.) when the update query runs.
first technique

BEGIN TRANSACTION;
    DECLARE
        @id integer, 
        @reputation integer;

    WITH 
        q4 AS
    (
        SELECT TOP (1) 
            fq.*
        FROM dbo.four_queue AS fq WITH(READPAST, ROWLOCK, UPDLOCK)
        WHERE fq.in_process = 0
        ORDER BY 
            fq.id
    )
    UPDATE q4
    SET 
        q4.in_process = 1,
        q4.start_date = SYSDATETIME(),
        @id = q4.id,
        @reputation = q4.reputation
    FROM q4;
    
    SELECT
        wul.*
    FROM dbo.WhatsUpLocks(@@SPID) AS wul;
ROLLBACK;
GO


Here's the query plan, with a single read reference to the table:

Here are the locks that were taken and released during query execution:

And here are the locks that remain before the transaction is rolled back:

second technique

Let's compare that with your initial attempt, applied to the table setup in the post I linked earlier.

BEGIN TRANSACTION;
    DECLARE
        @id integer, 
        @reputation integer;

    UPDATE 
        q4
    SET
        q4.in_process = 1,
        q4.start_date = SYSDATETIME(),
        @id = q4.id,
        @reputation = q4.reputation
    FROM dbo.four_queue AS q4
    WHERE q4.id = 
    (
        SELECT TOP (1) 
            x.id
        FROM dbo.four_queue AS x WITH (ROWLOCK, UPDLOCK, READPAST)
        WHERE x.in_process = 0
        ORDER BY 
            x.id
    );

    SELECT
        wul.*
    FROM dbo.WhatsUpLocks(@@SPID) AS wul;
ROLLBACK;
GO


Here's the query plan, which now has two read references to the table:

Here are the locks that were taken and released during query execution:

And here are the locks that remain before the transaction is rolled back:

differences

The locks acquired and released during query execution are far different. There are way fewer in the common table expression than in the update with subquery, largely because of the single read and update reference in the query plan.

There's only a slight difference in the remaining locks before the query is rolled back. The bottom row in my result shows an IX lock on pages, and yours shows UIX locks on pages.

How much these differences add up depends on concurrency while processing queues, but you might as well do it the right way so you don't have to worry about it falling apart later.

As a final note, I don't want you to come away from this thinking that common table expressions are magickal. They are not. You could get the same results with a derived table or a created view that produces a single result to update, without the use of a subquery. In this case, I find the common table expression makes the query more understandable. That is all.

Code Snippets

WITH 
    q AS
(
    SELECT TOP (1) 
        x.* 
    FROM sys_QueueJob x WITH (ROWLOCK, UPDLOCK, READPAST)
    WHERE x.ExecutionFinishedOn IS NULL
    AND 
    (
          x.ExecutionStartedOn IS NULL 
       OR x.ExecutionStartedOn < DATEADD(HOUR, -1, GETDATE())
    )
    ORDER BY 
        x.CreatedOn ASC
        /*You may need to also order by Id if CreatedOn isn't unique*/
)
UPDATE 
    q
SET 
    ExecutionStartedOn = GETDATE()
OUTPUT 
    Deleted.Id as Result;
BEGIN TRANSACTION;
    DECLARE
        @id integer, 
        @reputation integer;

    WITH 
        q4 AS
    (
        SELECT TOP (1) 
            fq.*
        FROM dbo.four_queue AS fq WITH(READPAST, ROWLOCK, UPDLOCK)
        WHERE fq.in_process = 0
        ORDER BY 
            fq.id
    )
    UPDATE q4
    SET 
        q4.in_process = 1,
        q4.start_date = SYSDATETIME(),
        @id = q4.id,
        @reputation = q4.reputation
    FROM q4;
    
    SELECT
        wul.*
    FROM dbo.WhatsUpLocks(@@SPID) AS wul;
ROLLBACK;
GO
BEGIN TRANSACTION;
    DECLARE
        @id integer, 
        @reputation integer;

    UPDATE 
        q4
    SET
        q4.in_process = 1,
        q4.start_date = SYSDATETIME(),
        @id = q4.id,
        @reputation = q4.reputation
    FROM dbo.four_queue AS q4
    WHERE q4.id = 
    (
        SELECT TOP (1) 
            x.id
        FROM dbo.four_queue AS x WITH (ROWLOCK, UPDLOCK, READPAST)
        WHERE x.in_process = 0
        ORDER BY 
            x.id
    );

    SELECT
        wul.*
    FROM dbo.WhatsUpLocks(@@SPID) AS wul;
ROLLBACK;
GO

Context

StackExchange Database Administrators Q#335133, answer score: 6

Revisions (0)

No revisions yet.