patternsqlMajor
Can I rely on reading SQL Server Identity values in order?
Viewed 0 times
canreadingidentityvaluesrelyordersqlserver
Problem
TL;DR: The question below boils down to: When inserting a row, is there a window of opportunity between the generation of a new
Detailed version
I have a SQL Server table with an
I currently rely on the fact that the reading processes can never "skip" a checkpoint. My question is: Can I rely on this property? And if not, what could I do to make it true?
Example: When rows with identity values 1, 2, 3, 4, and 5 are inserted, the reader must not see the row with value 5 prior to seeing the one with value 4. Tests show that the query, which contains an
I believe that at least in theory, there may be a race condition here that might cause this assumption to break. Unfortunately, documentation on
My reasoning is, it must work somehow like
Identity value and the locking of the corresponding row key in the clustered index, where an external observer could see a newer Identity value inserted by a concurrent transaction? (In SQL Server.)Detailed version
I have a SQL Server table with an
Identity column called CheckpointSequence, which is the key of the table's clustered index (which also has a number of additional nonclustered indexes). Rows are inserted into the table by several concurrent processes and threads (at isolation level READ COMMITTED, and without IDENTITY_INSERT). At the same time, there are processes periodically reading rows from the clustered index, ordered by that CheckpointSequence column (also at isolation level READ COMMITTED, with the READ COMMITTED SNAPSHOT option being turned off).I currently rely on the fact that the reading processes can never "skip" a checkpoint. My question is: Can I rely on this property? And if not, what could I do to make it true?
Example: When rows with identity values 1, 2, 3, 4, and 5 are inserted, the reader must not see the row with value 5 prior to seeing the one with value 4. Tests show that the query, which contains an
ORDER BY CheckpointSequence clause (and a WHERE CheckpointSequence > -1 clause), reliably blocks whenever row 4 is to be read, but not yet committed, even if row 5 has already been committed.I believe that at least in theory, there may be a race condition here that might cause this assumption to break. Unfortunately, documentation on
Identity doesn't say a lot about how Identity works in the context of multiple concurrent transactions, it only says "Each new value is generated based on the current seed & increment." and "Each new value for a particular transaction is different from other concurrent transactions on the table." (MSDN)My reasoning is, it must work somehow like
Solution
When inserting a row, is there a window of opportunity between the generation of a new Identity value and the locking of the corresponding row key in the clustered index, where an external observer could see a newer Identity value inserted by a concurrent transaction?
Yes.
The allocation of identity values is independent of the containing user transaction. This is one reason that identity values are consumed even if the transaction is rolled back. The increment operation itself is protected by a latch to prevent corruption, but that is the extent of the protections.
In the specific circumstances of your implementation, the identity allocation (a call to
By running two inserts concurrently with a debugger attached to allow me to freeze one thread just after the identity value is incremented and allocated, I was able to reproduce a scenario where:
After step 3, a query using row_number under locking read committed returned the following:
In your implementation, this would result in Checkpoint ID 3 being skipped incorrectly.
The window of misopportunity is relatively small, but it exists. To give a more realistic scenario than having a debugger attached: An executing query thread can yield the scheduler after step 1 above. This allows a second thread to allocate an identity value, insert and commit, before the original thread resumes to perform its insert.
For clarity, there are no locks or other synchronization objects protecting the identity value after it is allocated and before it is used. For example, after step 1 above, a concurrent transaction can see the new identity value using T-SQL functions like
Fundamentally, there are no more guarantees around identity values than documented:
That really is it.
If strict transactional FIFO processing is required, you likely have no choice but to serialize manually. If the application has less oneous requirements, you have more options. The question isn't 100% clear in that regard. Nevertheless, you may find some useful information in Remus Rusanu's article Using Tables as Queues.
Yes.
The allocation of identity values is independent of the containing user transaction. This is one reason that identity values are consumed even if the transaction is rolled back. The increment operation itself is protected by a latch to prevent corruption, but that is the extent of the protections.
In the specific circumstances of your implementation, the identity allocation (a call to
CMEDSeqGen::GenerateNewValue) is made before the user transaction for the insert is even made active (and so before any locks are taken).By running two inserts concurrently with a debugger attached to allow me to freeze one thread just after the identity value is incremented and allocated, I was able to reproduce a scenario where:
- Session 1 acquires an identity value (3)
- Session 2 acquires an identity value (4)
- Session 2 performs its insert and commits (so row 4 is fully visible)
- Session 1 performs its insert and commits (row 3)
After step 3, a query using row_number under locking read committed returned the following:
In your implementation, this would result in Checkpoint ID 3 being skipped incorrectly.
The window of misopportunity is relatively small, but it exists. To give a more realistic scenario than having a debugger attached: An executing query thread can yield the scheduler after step 1 above. This allows a second thread to allocate an identity value, insert and commit, before the original thread resumes to perform its insert.
For clarity, there are no locks or other synchronization objects protecting the identity value after it is allocated and before it is used. For example, after step 1 above, a concurrent transaction can see the new identity value using T-SQL functions like
IDENT_CURRENT before the row exists in the table (even uncommitted).Fundamentally, there are no more guarantees around identity values than documented:
- Each new value is generated based on the current seed & increment.
- Each new value for a particular transaction is different from other concurrent transactions on the table.
That really is it.
If strict transactional FIFO processing is required, you likely have no choice but to serialize manually. If the application has less oneous requirements, you have more options. The question isn't 100% clear in that regard. Nevertheless, you may find some useful information in Remus Rusanu's article Using Tables as Queues.
Context
StackExchange Database Administrators Q#133556, answer score: 33
Revisions (0)
No revisions yet.