patternsqlModerate
Should I invest the time to change the column type from CHAR(36) to UUID?
Viewed 0 times
theinvestcolumnchartimetypeshoulduuidfromchange
Problem
I have a few million rows in my database already. I didn't know about the PostgreSQL UUID data type when I designed my schema.
One of the tables has 16M rows (about 3.5M to 4 M records per shard), growing at about 500K records per day. I still have the luxury of taking the production system down for a few hours if required. I won't have this luxury in one or two weeks.
My question is, will it be worthwhile to do so? I'm wondering about JOIN performance, disk space use (full gzip'd dump is 1.25 GiB), things of that nature.
Table schema is:
One of the tables has 16M rows (about 3.5M to 4 M records per shard), growing at about 500K records per day. I still have the luxury of taking the production system down for a few hours if required. I won't have this luxury in one or two weeks.
My question is, will it be worthwhile to do so? I'm wondering about JOIN performance, disk space use (full gzip'd dump is 1.25 GiB), things of that nature.
Table schema is:
# \d twitter_interactions
Table "public.twitter_interactions"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------
interaction_id | character(36) | not null
status_text | character varying(1024) | not null
screen_name | character varying(40) | not null
twitter_user_id | bigint |
replying_to_screen_name | character varying(40) |
source | character varying(240) | not null
tweet_id | bigint | not null
created_at | timestamp without time zone | not null
Indexes:
"twitter_interactions_pkey" PRIMARY KEY, btree (interaction_id)
"twitter_interactions_tweet_id_key" UNIQUE, btree (tweet_id)
"index_twitter_interactions_on_created_at" btree (created_at)
"index_twitter_interactions_on_screen_name" btree (screen_name)
Triggers:
insert_twitter_interactions_trigger BEFORE INSERT ON twitter_interactions FOR EACH ROW EXECUTE PROCEDURE twitter_interactions_insert_trigger()
Number of child tables: 9 (Use \d+ to list them.)Solution
I would consider changing to the UUID type.
Also, I see no constraint in you schema that ensures that
If you like this, however, than using
char(36) takes 40 bytes, uuid takes 16, so you'll save 24 bytes per row, which for you will equate to 12 MB a day, 4 GB after a year. Plus indexes. Depending on what hardware you have, that isn't much, but it could be. And it adds up if you have more improvement opportunities like this.Also, I see no constraint in you schema that ensures that
interaction_id is actually in the right format. Using the right type will give you that as well.If you like this, however, than using
bigint would save even more and have even better performance. It's very unlikely that your application is so large that a bigint for an ID column won't work.Context
StackExchange Database Administrators Q#6912, answer score: 13
Revisions (0)
No revisions yet.