patternsqlMinor
Low cardinality fields: integer or string
Viewed 0 times
cardinalitylowfieldsstringinteger
Problem
A Ticket has the statuses:
I could make
Int pros: smallest size in index
Int con: low clarity for BI and evolving schema (
String pro: clear data navigation
String con: takes more space to index, gives less performance for same RAM
I imagine a string index wont take too much space over an int index if its low cardinality. Is it premature optimization to choose integer if the field has low cardinality and is not part of a compound index?
I use Postgres with SQLAlchemy, Python ORM.
new
in_progress
on_hold
closedI could make
ticket.status a string (on_hold) or an unique int(2). It is indexed.Int pros: smallest size in index
Int con: low clarity for BI and evolving schema (
3 compared to closed)String pro: clear data navigation
String con: takes more space to index, gives less performance for same RAM
I imagine a string index wont take too much space over an int index if its low cardinality. Is it premature optimization to choose integer if the field has low cardinality and is not part of a compound index?
I use Postgres with SQLAlchemy, Python ORM.
Solution
I would definitely normalize for big tables. You could use an
For small tables either of your ideas is ok. If in doubt, stick to the guidelines of the project.
I would use neither. Instead:
For small tables: use an
An
For big tables: use a
status_id
status
n
new
i
in_progress
o
on_hold
c
closed
It also depends on the complete picture. You only actually save space (and gain performance) in table and indexes if it's not lost to alignment padding anyway. More:
For character types (except for
integer or smallint column with a FK constraint to a lookup table.For small tables either of your ideas is ok. If in doubt, stick to the guidelines of the project.
I would use neither. Instead:
For small tables: use an
enum. The documentation:An
enum value occupies four bytes on disk.For big tables: use a
"char" field (1 byte) as FK to a lookup table. A good choice for very small sets of lookup values. And each value can be a mnemonic for the referenced state:status_id
status
n
new
i
in_progress
o
on_hold
c
closed
It also depends on the complete picture. You only actually save space (and gain performance) in table and indexes if it's not lost to alignment padding anyway. More:
- Configuring PostgreSQL for read performance
For character types (except for
"char"), collation rules may be relevant, too. See:- Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Context
StackExchange Database Administrators Q#118695, answer score: 5
Revisions (0)
No revisions yet.