patternsqlMinor
Changing primary key from IDENTITY to being persisted Computed column using COALESCE
Viewed 0 times
identitycomputedprimarycoalescecolumnpersistedbeingusingchangingfrom
Problem
In an attempt to decouple an application from our monolithic database, we've tried to change INT IDENTITY columns of various tables to be a PERSISTED computed column that uses COALESCE. Basically, we need the decoupled application an ability to still update the database for common data shared across many applications while still allowing existing applications to create data in these tables without the need for code or procedure modification.
So essentially, we've moved from a column definition of;
to;
In all cases the PkId is also a PRIMARY KEY and in all but one case, it is CLUSTERED. All tables have the same foreign keys and indexes as before. In essence, the new format allows the PkId to be supplied by the decoupled application (as external_id), but also allows the PkId to be the IDENTITY column value therefore allowing existing code that relies on the IDENTITY column through use of SCOPE_IDENTITY and @@IDENTITY to work as it used to.
The problem we've had is that we've come across a couple of queries that used to run in an acceptable time to now blow out completely. The generated query plans used by these queries is nothing like what it used to be before.
Given the new column is a PRIMARY KEY, the same data type as before, and PERSISTED, I would have expected queries and query plans to behave the same as they did before. Should the COMPUTED PERSISTED INT PkId essentially behave the same way as an explicit INT definition in terms of how SQL Server will produce the execution plan? Are there other likely issues with this approach that you can see?
The purpose of this change was supposed to allow us to change the table definition without the need to modify existing procedures and code. Given
So essentially, we've moved from a column definition of;
PkId INT IDENTITY(1,1) PRIMARY KEYto;
PkId AS AS COALESCE(old_id, external_id, new_id) PERSISTED NOT NULL,
old_id INT NULL, -- Values here are from existing records of PkId before table change
external_id INT NULL,
new_id INT IDENTITY(2000000,1) NOT NULLIn all cases the PkId is also a PRIMARY KEY and in all but one case, it is CLUSTERED. All tables have the same foreign keys and indexes as before. In essence, the new format allows the PkId to be supplied by the decoupled application (as external_id), but also allows the PkId to be the IDENTITY column value therefore allowing existing code that relies on the IDENTITY column through use of SCOPE_IDENTITY and @@IDENTITY to work as it used to.
The problem we've had is that we've come across a couple of queries that used to run in an acceptable time to now blow out completely. The generated query plans used by these queries is nothing like what it used to be before.
Given the new column is a PRIMARY KEY, the same data type as before, and PERSISTED, I would have expected queries and query plans to behave the same as they did before. Should the COMPUTED PERSISTED INT PkId essentially behave the same way as an explicit INT definition in terms of how SQL Server will produce the execution plan? Are there other likely issues with this approach that you can see?
The purpose of this change was supposed to allow us to change the table definition without the need to modify existing procedures and code. Given
Solution
FIRST
You probably don't need all three columns:
So, get rid of the
That reduces the new structure to be just:
Now you have only added 8 bytes per row instead of 12 bytes (assuming that you aren't using the
SECOND
Continuing down this path of simplification, let's look at what we have left:
There is never a time when you would need values in both
Now we can look into why we needed the
Combining the
In this approach, you just need to:
-
Leave tables as being:
This adds 0 bytes to each row, instead of 8, or even 12.
SECOND, Part B
A variation on the approach noted directly above would be to have the App code insert values starting with -1 and going down from there. This leaves the
In this approach, you just need to:
-
Leave tables as being:
This adds 0 bytes to each row, instead of 8, or even 12.
Here you s
You probably don't need all three columns:
old_id, external_id, new_id. The new_id column, being an IDENTITY, will have a new value generated for each row, even when you insert into external_id. But, between old_id and external_id, those are pretty much mutually exclusive: either there is already an old_id value or that column, in the current conception, will just be NULL if using external_id or new_id. Since you won't be adding a new "external" id to a row that already exists (i.e. one that has an old_id value), and there won't be any new values coming in for old_id, then there can be one column that is used for both purposes.So, get rid of the
external_id column and rename old_id to be something like old_or_external_id or whatever. This shouldn't require any real changes to anything, yet reduces some of the complication. At most you might need to call the column external_id, even if it contains "old" values, if app code is already written to insert into external_id.That reduces the new structure to be just:
PkId AS AS COALESCE(old_or_external_id, new_id, -1) PERSISTED NOT NULL,
old_or_external_id INT NULL, -- values from existing record OR passed in from app
new_id INT IDENTITY(2000000, 1) NOT NULLNow you have only added 8 bytes per row instead of 12 bytes (assuming that you aren't using the
SPARSE option or Data Compression). And you didn't need to change any code, T-SQL or App code.SECOND
Continuing down this path of simplification, let's look at what we have left:
- The
old_or_external_idcolumn either has values already, or will be given a new value from the app, or will be left asNULL.
- The
new_idwill always have a new value generated, but that value will only be used if theold_or_external_idcolumn isNULL.
There is never a time when you would need values in both
old_or_external_id and new_id. Yes, there will be times when both columns have values due to new_id being an IDENTITY, but those new_id values are ignored. Again, these two fields are mutually exclusive. So what now?Now we can look into why we needed the
external_id in the first place. Considering that it is possible to insert into an IDENTITY column using SET IDENTITY_INSERT {table_name} ON;, you could get away with making no schema changes at all, and only modify your app code to wrap the INSERT statements / operations in SET IDENTITY_INSERT {table_name} ON; and SET IDENTITY_INSERT {table_name} OFF; statements. You then need to determine what starting range to reset the IDENTITY column to (for newly generated values) as it will need to be well above the values that the App code will be inserting since inserting a higher value will cause the next auto-generated value to be greater than the current MAX value. But you can always insert a value that is below the IDENT_CURRENT value.Combining the
old_or_external_id and new_id columns also does not increase the chances of running into an overlapping value situation between auto-generated values and app-generated values since the intention of having the 2, or even 3, columns is to combine them into a Primary Key value, and those are always unique values.In this approach, you just need to:
-
Leave tables as being:
PkId INT IDENTITY(1,1) PRIMARY KEYThis adds 0 bytes to each row, instead of 8, or even 12.
- Determine the starting range for app-generated values. These will be greater than the current MAX value in each table, but less than what will become the minimum value for the auto-generated values.
- Determine what value the auto-generated range should start at. There should be plenty of room between the current MAX value and plenty of room to grow, knowing at the upper limit is just over 2.14 billion. You can then set this new minimum seed value via DBCC CHECKIDENT.
- Wrap app code INSERTs in
SET IDENTITY_INSERT {table_name} ON;andSET IDENTITY_INSERT {table_name} OFF;statements.
SECOND, Part B
A variation on the approach noted directly above would be to have the App code insert values starting with -1 and going down from there. This leaves the
IDENTITY values as being the only ones going up. The benefit here is that you not only don't complicate the schema, you also don't need to worry about running into overlapping IDs (if the app-generated values run into the new auto-generated range). This is only an option if you aren't already using negative ID values (and it seems pretty rare for people to use negative values on auto-generated columns so this should be a likely posibility in most situations).In this approach, you just need to:
-
Leave tables as being:
PkId INT IDENTITY(1,1) PRIMARY KEYThis adds 0 bytes to each row, instead of 8, or even 12.
- The starting range for app-generated values will be
-1.
- Wrap app code INSERTs in
SET IDENTITY_INSERT {table_name} ON;andSET IDENTITY_INSERT {table_name} OFF;statements.
Here you s
Code Snippets
PkId AS AS COALESCE(old_or_external_id, new_id, -1) PERSISTED NOT NULL,
old_or_external_id INT NULL, -- values from existing record OR passed in from app
new_id INT IDENTITY(2000000, 1) NOT NULLPkId INT IDENTITY(1,1) PRIMARY KEYPkId INT IDENTITY(1,1) PRIMARY KEYPkId INT PRIMARY KEY CONSTRAINT [DF_TableName_NextID] DEFAULT (NEXT VALUE FOR...)Context
StackExchange Database Administrators Q#154773, answer score: 4
Revisions (0)
No revisions yet.