HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Identity column re-seed: when it is necessary?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
identitynecessarycolumnseedwhen

Problem

During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.

One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).

This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.

Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?

Thanks in advance!

Solution

That is, when table row is deleted, it's PK must be reused in subsequent inserts.

What universe is your lecturer from??

That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.

If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.

In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.

Gaps are a convenience to the users and DBAs, not an inconvenience.

I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.

Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.

Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.

Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!

Feel free to have us sanity-check anything else that lecturer says that is suspicious.

Context

StackExchange Database Administrators Q#223392, answer score: 19

Revisions (0)

No revisions yet.