principlesqlCritical
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Viewed 0 times
noticeablyindexallwithchararevarcharwouldfasterwhen
Problem
I have a legacy schema (disclaimer!) that uses a hash-based generated id for the primary key for all tables (there are many). An example of such an id is:
There is no possible hope of changing this approach, however performance with index access is poor. Setting aside the myriad of reasons this might be, there is one thing I noticed that seemed less than optimal - despite all id values in all many tables being exactly 36 characters in length, the column type is
Would altering the column types to fixed length
Ie does postgres perform much faster when dealing with fixed-length types than with variable length types?
Please don't mention the minuscule storage saving - that's not going be matter compared with the surgery required to make the change to the columns.
922475bb-ad93-43ee-9487-d2671b886479There is no possible hope of changing this approach, however performance with index access is poor. Setting aside the myriad of reasons this might be, there is one thing I noticed that seemed less than optimal - despite all id values in all many tables being exactly 36 characters in length, the column type is
varchar(36), not char(36).Would altering the column types to fixed length
char(36) offer any significant index performance benefits, beyond the very small increase in the number of entries per index page etc?Ie does postgres perform much faster when dealing with fixed-length types than with variable length types?
Please don't mention the minuscule storage saving - that's not going be matter compared with the surgery required to make the change to the columns.
Solution
No. No gain at all. The manual explicitly states:
Tip: There is no performance difference among these three types, apart
from increased storage space when using the blank-padded type, and a
few extra CPU cycles to check the length when storing into a
length-constrained column. While
advantages in some other database systems, there is no such advantage
in PostgreSQL; in fact
three because of its additional storage costs. In most situations
or
Bold emphasis mine.
If all strings are exactly 36 characters in length, there is no storage saving either way, not even a minuscule one. Both have exactly the same size on disk and in RAM. You can test with
And if all strings must have 36 characters, rather make it
You didn't ask for other options, but I'll mention two:
Unless you are running your DB with the "C" collation. Collation is often overlooked and possibly expensive. Since your strings don't seem to be meaningful in a natural language, there is probably no point in following
Extensive benchmark comparing (among other) the effect of
Your string suspiciously looks like a UUID (32 hex digits separated by 4 delimiters in canonical way). It's much more efficient to store UUIDs as actual
This may be helpful (last chapters):
See also:
Tip: There is no performance difference among these three types, apart
from increased storage space when using the blank-padded type, and a
few extra CPU cycles to check the length when storing into a
length-constrained column. While
character(n) has performanceadvantages in some other database systems, there is no such advantage
in PostgreSQL; in fact
character(n) is usually the slowest of thethree because of its additional storage costs. In most situations
textor
character varying should be used instead.Bold emphasis mine.
char(n) is a largely outdated, useless type. Stick with varchar(n). Without need to enforce a maximum length, varchar or text are a tiny bit faster, with fewer complications.If all strings are exactly 36 characters in length, there is no storage saving either way, not even a minuscule one. Both have exactly the same size on disk and in RAM. You can test with
pg_column_size() (on an expression and on a table column).And if all strings must have 36 characters, rather make it
text with a CHECK (length(col) = 36) constraint enforcing exact length, not varchar(36) only enforcing max. length. See:- Any downsides of using data type “text” for storing strings?
You didn't ask for other options, but I'll mention two:
COLLATION
Unless you are running your DB with the "C" collation. Collation is often overlooked and possibly expensive. Since your strings don't seem to be meaningful in a natural language, there is probably no point in following
COLLATION rules. Related:- How do I efficiently get "the most recent corresponding row"?
- EXECUTE within function not using index?
Extensive benchmark comparing (among other) the effect of
COLLATE "C" on performance:- Slow query ordering by a column in a joined table
- UUID
Your string suspiciously looks like a UUID (32 hex digits separated by 4 delimiters in canonical way). It's much more efficient to store UUIDs as actual
uuid data type: faster in multiple ways, and only occupies 16 bytes per UUID - as opposed to 37 bytes in RAM for either char(36) or varchar(36) (stored without delimiters, just the 32 defining characters), or 33 bytes on disk. But alignment padding would result in 40 bytes either way in many cases.) COLLATION is irrelevant for the uuid data type, too.SELECT '922475bb-ad93-43ee-9487-d2671b886479'::uuid;This may be helpful (last chapters):
- Convert hex in text representation to decimal number
See also:
- What is the overhead for varchar(n)?
- What is the optimal data type for an MD5 field?
Code Snippets
SELECT '922475bb-ad93-43ee-9487-d2671b886479'::uuid;Context
StackExchange Database Administrators Q#89429, answer score: 60
Revisions (0)
No revisions yet.