HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Low cardinality fields: integer or string

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
cardinalitylowfieldsstringinteger

Problem

A Ticket has the statuses:

new
in_progress
on_hold
closed


I could make ticket.status a string (on_hold) or an unique int(2). It is indexed.

Int pros: smallest size in index

Int con: low clarity for BI and evolving schema (3 compared to closed)

String pro: clear data navigation

String con: takes more space to index, gives less performance for same RAM

I imagine a string index wont take too much space over an int index if its low cardinality. Is it premature optimization to choose integer if the field has low cardinality and is not part of a compound index?

I use Postgres with SQLAlchemy, Python ORM.

Solution

I would definitely normalize for big tables. You could use an integer or smallint column with a FK constraint to a lookup table.

For small tables either of your ideas is ok. If in doubt, stick to the guidelines of the project.

I would use neither. Instead:

For small tables: use an enum. The documentation:

An enum value occupies four bytes on disk.

For big tables: use a "char" field (1 byte) as FK to a lookup table. A good choice for very small sets of lookup values. And each value can be a mnemonic for the referenced state:

status_id
status

n
new

i
in_progress

o
on_hold

c
closed

It also depends on the complete picture. You only actually save space (and gain performance) in table and indexes if it's not lost to alignment padding anyway. More:

  • Configuring PostgreSQL for read performance



For character types (except for "char"), collation rules may be relevant, too. See:

  • Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

Context

StackExchange Database Administrators Q#118695, answer score: 5

Revisions (0)

No revisions yet.