HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

When should a primary key be meaningful?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
primaryshouldwhenmeaningfulkey

Problem

Excuse the looseness of any definitions here, but I am trying to explore a simple concept.

A Primary Key uniquely identifies a row. There are possibly other columns in a table with unique values, so they can also uniquely identify a row (Candidate Keys), but the Primary Key is the one designated for the task.

Properties which make a Primary Key more useful include:

  • guaranteed to be unique; some other unique column values may be duplicated as the table grows



  • unlikely to change; although Foreign Key constraints can include cascading, it is better not to require this



  • will not be recycled; a row which is deleted for some reason should not free the old PK value to be reused



For these reasons, I generally recommend that the Primary Key have no intrinsic value, so there would never be a reason to change or recycle a value. That is, it should be otherwise meaningless.

I have seen Primary Keys which include some sort of code, such as a client code based on the name. The obvious problems are (a) if the client name changes then the PK should change too, and (b) there is too much of a risk of conflict with clients with similar names.

A half exception is the use of an auto-incremented number, which has a minor meaning of a sequence number. However, it is still stable.

The question is, under what circumstances, if ever, is preferable to use a primary key with some other real meaning? That is, is there anything wrong with the advice that the PK should be arbitrary, and you can generally get by with a sequence number?

Solution

under what circumstances, if ever, is preferable to use a primary key with some other real meaning? (emphasis added)

Given that the focus of this question is "preferable" and not "acceptable", and accepting that this is still a highly subjective topic, I will say that I cannot think of a situation where it is best for the system to have a truly natural key for a variety of reasons (most of which has been said before in other answers that Paul linked to in a comment on the question):

  • What is thought to be unique is not always unique (e.g. Social Security Numbers / SSNs in the U.S.)



  • Sometimes things change, either in value or uniqueness (we don't control the external world)



  • Even when something should be "stable" in value and uniqueness (e.g. SKU, perhaps), can the incoming value be guaranteed to be correct? Humans often mistype stuff doing data entry. There are also bugs in export processes that might cause values in a file imported by your system to be incorrect. There are also bugs in other systems that feed data into yours that can allow for the data itself to not be entirely correct, even if their export process worked correctly.



I emphasize "truly" because there are two situations where I prefer to not have a new surrogate key:

Bridge Tables

(or whatever you like to call tables used only, or mainly, to represent many-to-many relationships)

Thing ThingXTag Tag
------ --------- ---
ThingID INT AutoMagic PK ---> ThingID INT PK, FK
Stuff SomeType TagID INT PK, FK

When modeling a bridge table (a table which doesn't exist in the logical model but is needed in the physical model), the PK there should be the existing Primary Key columns of the tables being related via this table. This allows for enforcing the proper uniqueness and non-NULL-ness of the values without needing a separate unique index / constraint. In the rare case of needing to foreign key to this relationship, it will:

  • be meaningful in that each of the key columns will actually point back to the original source tables without needing yet another join, and



  • prevent someone or something from updating the key columns that form the relationship between the two primary tables without updating the value being linked to via a foreign key.



WackyTable ThingXTag
---------- ---------
WackyTableID INT AutoMagic PK
ThingID INT FK ---> ThingID INT PK, FK (to Thing.ThingID)
TagID INT FK ---> TagID INT PK, FK (to Tag.TagID)
AttributeX VARCHAR InsertDate DATETIME
InsertDate DATETIME


I have worked on a system where these bridge tables had their own auto-incrementing, surrogate key PKs, and the single column surrogate key of the bridge table was referenced in other tables via FK:

WackyTable ThingXTag
---------- ---------
WackyTableID INT AutoMagic PK
ThingXTagID INT FK ---> ThingXTagID INT AutoMagic PK
AttributeX VARCHAR ThingID INT FK (to Thing.ThingID)
InsertDate DATETIME TagID INT FK (to Tag.TagID)
InsertDate DATETIME


It was a horrible, confusing mess that we wasted way too much time on for debugging, etc.

Sibling Tables

These are tables that are truly a single entity and thus have a 1-to-1 relationship. They are only split into two (or more, as is needed) tables for performance reasons. I have done this with tables having 1 million (or more) rows where it was either very wide, or if it was moderately wide and there were columns that were either not used very frequently or were string of over 50 bytes. Stuff like that. This keeps the core properties of the entity in a narrower table that fits more rows on each data page.

In these cases, the "sibling" table is at the exact same level as the initial table and should have the same
PK. There is no use in giving it an auto-incrementing surrogate key since each row has what amounts to a natural key from the initial table.

Product ProductProperty
------- ---------------
ProductID INT AutoMagic PK ---> ProductID INT PK, FK (to Product.ProductID)
Name VARCHAR ShortDescription VARCHAR
SKU VARCHAR SomethingElse SomeType
Quantity INT UpdateDate DATETIME
CreateDate DATETIME
UpdateDate DATETIME
`

To be clear, I am speaking in terms of the physical model, not the conceptual model. I assumed that the focus of this question was the physical model since it is framed in the context of issues that do not exist conceptually: surrogate keys, issues dealing with usage of the primary key value, etc. With that in mind, I was not implying that natural keys should not be stored and used fo

Context

StackExchange Database Administrators Q#201570, answer score: 8

Revisions (0)

No revisions yet.