HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Choosing primary keys: Scientific names of species or system-assigned numeric identifiers?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
primarynumericscientificsystemnameskeysassignedidentifierschoosingspecies

Problem

Scientific names are well accepted unique identifiers of species. However, scientific names have special characters such as spaces and dots. Can it be a good unique identifier for databasing or should one assign/implement unique numerical identifiers?

If it is with the latter, is there a globally known numeric identifier? Or should one assign own identifiers for the rows?

Solution

I would use my own identifier.

The species name may be unique but
  • it's too long
  • it's a string



For example, in SQL Server, if used as the clustered primary key it will be used in non-clustered indexes, thus repeating the long string. And typically foreign keys omn child tables will go to the primary key, thus repeating it again

As a string, you have the overhead of sorting and comparison (case, accents, etc).

Using a surrogate numeric key avoids these problems: but you must create a unique non-clustered index on the species name.

Is species name a good long term identifier too? Not my area of expertise but don't many species have alternate names, or controversies, or get reclassified, or "maybe this species"?

Example: how many Giraffe species are there? 9? 2? 6? 8? 4?

Context

StackExchange Database Administrators Q#197363, answer score: 12

Revisions (0)

No revisions yet.