patternsqlMinor
Using timestamp plus integer as primary key
Viewed 0 times
primaryplususingtimestampintegerkey
Problem
In Sql Server, what would the drawbacks and disadvantages be of using a primary key that is:
Two Character Table Identifier + YEAR + MONTH + DAY + HOUR + MINUTE + MILLISECOND + random integer between 0 and 100?
Why would this be preferred to using an Integer Auto_Incremented field? (Unless it wouldn't, then I want to know that too.)
I would love reasons why this is a really bad design practice. Casting everything in order to get back that integer value is also a total pain in the ass.
I basically stand once chance at getting my small team of developers access to help improve our applications' database, which it needs badly-To avoid situations like primary key Varchar(50) fields with left-padded zero's, or to just start improving the non-normalized database, or comma-separated lists in one field.
Two Character Table Identifier + YEAR + MONTH + DAY + HOUR + MINUTE + MILLISECOND + random integer between 0 and 100?
Why would this be preferred to using an Integer Auto_Incremented field? (Unless it wouldn't, then I want to know that too.)
I would love reasons why this is a really bad design practice. Casting everything in order to get back that integer value is also a total pain in the ass.
I basically stand once chance at getting my small team of developers access to help improve our applications' database, which it needs badly-To avoid situations like primary key Varchar(50) fields with left-padded zero's, or to just start improving the non-normalized database, or comma-separated lists in one field.
Solution
Sure you CAN do it, but why would you? To save a few CAST expressions? That seems a bit weak.
The reasons to have auto incrementing surrogate int PK's are many:
-
You pretty much never have to manage them with normal databases. Ever. Unless you are rebuilding tables or inserting data where you have to turn the identity off and on. But operationally it works well with little overhead. If you do it the way you suggest, it causes much more overhead with the need to have a UDF to create the key which involves a lot more CPU cycles to create than an auto increment int.
-
They are the most efficient storage solution. The way you suggest would require a varchar(20)/char(20) or so meaning 20 bytes per row just for the key. A normal int is only 4 bytes. It might not seem like much, but put 100 million rows in there and you increase the size by around 1.5GB. Not cool. Then you have to put it as the FK in another table, another 1.5GB. Or in 3 other tables and your DB is needlessly almost 5GB larger. You get the idea...
-
Tables should really be linked together with something the users don't care about or understand. This is so that you have a low granularity of control as a DBA, a value that you can refer to which the users know nothing of which becomes important when doing highly granular changes, fine tuning the table or managing the data structure. I can just tell you this from experience of working with DBs that have composite keys like you suggest. They become frustratingly annoying and you are always tempted to just put in an auto incrementing int and be done with the complexity and frustration. Imagine for example, you wish to quickly insert a whole bunch of records that have come from a different source. With an auto increment, easy with minimal hit on server resources. With a composite key... you have to run a UDF which creates the key for each insert... and in your example the randomiser for 100 keys per ms might not be enough, SQL Server may work too fast for you and give you the same ID for 2 records. Damn, you need a different approach which doesn't violate the unique constraint. Or imagine the user wishes to start "reusing" a key for some reason (it happens, sometimes on some mistaken idea of "recycling making the database faster"). Users end up trying to tune the DB themselves, a recipe for disaster. Telling them (and having) behind the scenes things occurring that they don't understand gives a level of separation that often avoids the situation. They don't try to optimise anything, taking their sticky little fingers out of your side of the business, meaning you can optimise the way it should be done. Much like a teenager with no idea attempting to change the electronic timing of a modern car with their smart phone and a hammer... a job much better done by a professional with the right knowledge and tools.
-
SQL Server expects int PK-FK auto increments and is optimised for it. It will mean faster processing overall due to FK checks when inserting records.
Comparisons between integers are fast in general, and this is just as true for joins. There are very many optimizations in the engine for integer/bigint (too many to list), but as an example, seeking down the b-tree may use linear interpolation instead of binary search. As a second example, batch mode processing is very much faster when the normalized representation fits in 64 bits. This is the case for integer and bigint, but not, for example, for UUIDs. Third example, bitmap filters built on not null integer/bigint can be pushed into the storage engine for early evaluation. Bitmap filters can also be more efficient in batch mode when the 64-bit threshold is not exceeded.
-
Lastly, I find natural keys to be absolutely fine with small databases with few users that need little optimisation and management. They can also make for cleaner looking queries (by using the composite key in WHERE statements) when the queries are simple, meaning you have less joined tables etc. But as soon as the database starts getting large and complex... managing auto incrementing ints surrogate keys becomes a lot easier.
My final thought: Composite keys are FINE for lookup tables which have a low number of members (meaning a shorter composite key and therefore low space requirements) that are not subject to a lot of change (meaning low management overhead). An example is two letter state codes in the US. They will mean potentially one less join and over many stored procs, this could make a small but measurable performance improvement.
The reasons to have auto incrementing surrogate int PK's are many:
-
You pretty much never have to manage them with normal databases. Ever. Unless you are rebuilding tables or inserting data where you have to turn the identity off and on. But operationally it works well with little overhead. If you do it the way you suggest, it causes much more overhead with the need to have a UDF to create the key which involves a lot more CPU cycles to create than an auto increment int.
-
They are the most efficient storage solution. The way you suggest would require a varchar(20)/char(20) or so meaning 20 bytes per row just for the key. A normal int is only 4 bytes. It might not seem like much, but put 100 million rows in there and you increase the size by around 1.5GB. Not cool. Then you have to put it as the FK in another table, another 1.5GB. Or in 3 other tables and your DB is needlessly almost 5GB larger. You get the idea...
-
Tables should really be linked together with something the users don't care about or understand. This is so that you have a low granularity of control as a DBA, a value that you can refer to which the users know nothing of which becomes important when doing highly granular changes, fine tuning the table or managing the data structure. I can just tell you this from experience of working with DBs that have composite keys like you suggest. They become frustratingly annoying and you are always tempted to just put in an auto incrementing int and be done with the complexity and frustration. Imagine for example, you wish to quickly insert a whole bunch of records that have come from a different source. With an auto increment, easy with minimal hit on server resources. With a composite key... you have to run a UDF which creates the key for each insert... and in your example the randomiser for 100 keys per ms might not be enough, SQL Server may work too fast for you and give you the same ID for 2 records. Damn, you need a different approach which doesn't violate the unique constraint. Or imagine the user wishes to start "reusing" a key for some reason (it happens, sometimes on some mistaken idea of "recycling making the database faster"). Users end up trying to tune the DB themselves, a recipe for disaster. Telling them (and having) behind the scenes things occurring that they don't understand gives a level of separation that often avoids the situation. They don't try to optimise anything, taking their sticky little fingers out of your side of the business, meaning you can optimise the way it should be done. Much like a teenager with no idea attempting to change the electronic timing of a modern car with their smart phone and a hammer... a job much better done by a professional with the right knowledge and tools.
-
SQL Server expects int PK-FK auto increments and is optimised for it. It will mean faster processing overall due to FK checks when inserting records.
Comparisons between integers are fast in general, and this is just as true for joins. There are very many optimizations in the engine for integer/bigint (too many to list), but as an example, seeking down the b-tree may use linear interpolation instead of binary search. As a second example, batch mode processing is very much faster when the normalized representation fits in 64 bits. This is the case for integer and bigint, but not, for example, for UUIDs. Third example, bitmap filters built on not null integer/bigint can be pushed into the storage engine for early evaluation. Bitmap filters can also be more efficient in batch mode when the 64-bit threshold is not exceeded.
-
Lastly, I find natural keys to be absolutely fine with small databases with few users that need little optimisation and management. They can also make for cleaner looking queries (by using the composite key in WHERE statements) when the queries are simple, meaning you have less joined tables etc. But as soon as the database starts getting large and complex... managing auto incrementing ints surrogate keys becomes a lot easier.
My final thought: Composite keys are FINE for lookup tables which have a low number of members (meaning a shorter composite key and therefore low space requirements) that are not subject to a lot of change (meaning low management overhead). An example is two letter state codes in the US. They will mean potentially one less join and over many stored procs, this could make a small but measurable performance improvement.
Context
StackExchange Database Administrators Q#72342, answer score: 7
Revisions (0)
No revisions yet.