HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Creating Sequential IDs

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
creatingidssequential

Problem

I'm using SQL Server 2014 with Visual Studio 2013. When I add a new record, identity column (seed 1, increment 1) "ID" increases by 1, and record added to the database.

My table consist of two rows:

ID, set identity, seed=1, increment=1
Column1, int
Column2, nvarchar(10), Primary Key


ID Column1 Date
1.  1  300     01.01.2001
2.  2  301     02.01.2001


My query is as follows:

insert into Table values(302,'02.01.2001')


Clearly it gives an error, because 02.01.2001 already exists.

This, however:

insert into Table values(302,'03.01.2001')


successfully inserts data.

But when I select:

select * from Table


the result is:

ID Column1 Date
1.  1  300     01.01.2001
2.  2  301     02.01.2001
3.  4  302     03.01.2001


Sometimes after this unsuccessful attempts of inserting data, the ID increases but the data doesn't recorded to data, because I have made a mistake. (Like I show above.) Then, after a successful insert query, data records and ID jump to 4 instead of 3.

I have thought of a workaround (it works from my point of view):

declare @maxIDValue int
if exists (select ID from NVDB.dbo.V where ID=1) --check if table has its first record.
    begin
        if (select count(ID) from NVDB.dbo.V) >= 1
        set @maxIDValue = (SELECT MAX(ID) FROM NVDB.dbo.V)
    end 
else set @maxIDValue = 1
if not exists (select Date from NVDB.dbo.V where Date='03.10.2010')
    begin
        dbcc checkident ('NVDB.dbo.V', RESEED, @maxIDValue)
        insert into NVDB.dbo.V values(300,'03.10.2010')
    end

SELECT * FROM NVDB.dbo.V


With this piece of code, I want my table in order and after the insertion attempts. It gives no error, and of course ID goes sequentially.

Is this a right approach? What is the best practice? Do we have a simple way to achieve this or did I write so much code?

Note: My table is small, so performance is not an issue. But when it becomes big, like 3000 records, does this code affect performance? (

Solution

Minutia

A table called V ... really? Columns Column1 and Column2 .... why are you using example tables? In your setup you called it Table but in the code you called it V...

Normally we require that code is real code, not 'example' code. In this case, it seems your actual code is 'real', just the tables it works on is not.
IDENTITY columns

IDENTITY columns are generally there for one reason only, to create a unique field that is fast to use (insert/index/join/search).

These properties are guaranteed in all uses of the field, including concurrent modification of the data (two processes inserting data at the same time will each generate unique ID's still - guaranteed)

You want to have the added use for the column: to ensure there are no 'gaps' in the sequence. This is not what IDENTITY columns are for.

IDENTITY columns, for performance reasons, generally allocate a 'pool' of numbers to each process that's adding data, and as each number is used from the pool, the pool shrinks. If the pool is no longer needed, then the pool is 'returned' to the system and another process can continue with it. If two processes are inserting data at the same time you will often get gaps between the numbers because the pools are different. Also, if there's a "hard crash" on the system, the pool is often completely 'lost', and you could have large gaps in the ranges.

This is the compromise that IDENTITY columns make. They sacrifice sequentialness for performance and concurrency.
Solution

There are two solutions, the easy one, and the hard one.

The easy one is to accept that there will be gaps, and move on. I strongly recommend that you revisit your requirements, and see if you can just use the IDENTITY column and move on.

The hard solution is to build your own system for number generation (although it will still not be perfect... if someone deletes a record, you will still have gaps).

Your solution is too simple, and also too slow. It is, essentially, broken.

If two processes run at the same time, they will allocate the same numbers for records, and you will have duplicate insert situations in to your table. This is because you do not have any transaction handling for your statements... they are all independent. You need to have a table-lock on the table so that no other process can add records while you are finding the max value. The lock is needed for the entire process of finding, and deciding, and inserting the value. You will need to read up on "with holdlock" for your select statements otherwise the lock will not be maintained: See the HOLDLOCK and other documentation on transactions.

Then, after the manipulation is done, you can commit the transaction.

Your performance will suck because you will be locking a lot of data, etc.

The better solution is to have a separate ID table for your key field. A simple table like:

create table VTableKey (
    nextid integer not null
)

insert into VTableKey values (1)


Now, with that table, you can:

  • begin a transaction



  • query-with-update-lock and update just that one record



  • insert into the V table



  • commit the transaction



In that way, you maintain the 'atomicity' of the transaction - the value only changes on a successful insert. I would consider using a stored procedure to encapsulate that logic.

Note that you are still vulnerable to someone deleting records, you will still end up with gaps.

Code Snippets

create table VTableKey (
    nextid integer not null
)

insert into VTableKey values (1)

Context

StackExchange Code Review Q#92344, answer score: 6

Revisions (0)

No revisions yet.