patternsqlMajor
Should I use UUID as well as ID
Viewed 0 times
uuidusewellshould
Problem
I've been using UUIDs in my systems for a while now for a variety of reasons ranging from logging to delayed correlation. The formats I used changed as I became less naive from:
It was when I reached the final one
I was initially sceptical towards using UUIDs as primary keys, and indeed I still am, however I see potential here to create a flexible database that can use both. Whereas many people stress over the advantages of either, what are the disadvantages cancelled out by using both data types?
The use case for this type of set up would be the traditional primary key for inter-table relationships, with unique identifier used for inter-system relationships.
What I am essentially trying to discover is difference in efficiency between the two approaches. Besides the quadruple disk space used, which may be largely negligible after additional data is added, they appear to me to be the same.
Schema:
``
--
-- --------------------------------------------------------
--
-- Table structure for tabl
VARCHAR(255)
VARCHAR(36)
CHAR(36)
BINARY(16)
It was when I reached the final one
BINARY(16) that I started to compare performance with basic auto-increment integer. The test and results are shown below, but if you just want the summary, it indicates that INT AUTOINCREMENT and BINARY(16) RANDOM have identical performance on data ranges up to 200,000 (the database was pre-populated prior to tests).I was initially sceptical towards using UUIDs as primary keys, and indeed I still am, however I see potential here to create a flexible database that can use both. Whereas many people stress over the advantages of either, what are the disadvantages cancelled out by using both data types?
PRIMARY INT
UNIQUE BINARY(16)
The use case for this type of set up would be the traditional primary key for inter-table relationships, with unique identifier used for inter-system relationships.
What I am essentially trying to discover is difference in efficiency between the two approaches. Besides the quadruple disk space used, which may be largely negligible after additional data is added, they appear to me to be the same.
Schema:
``
-- phpMyAdmin SQL Dump
-- version 4.0.10deb1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Sep 22, 2015 at 10:54 AM
-- Server version: 5.5.44-0ubuntu0.14.04.1
-- PHP Version: 5.5.29-1+deb.sury.org~trusty+3
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
/!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT /;
/!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS /;
/!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION /;
/!40101 SET NAMES utf8 /;
--
-- Database: test`--
-- --------------------------------------------------------
--
-- Table structure for tabl
Solution
UUIDs are a performance disaster for very large tables. (200K rows is not "very large".)
Your #3 is really bad when the
UUIDs (GUIDs) are very "random". Using them as either a UNIQUE or a PRIMARY key on large tables is very inefficient. This is because of having to jump around the table/index each time you
So, don't use UUIDs unless either
More on UUIDs: http://mysql.rjweb.org/doc.php/uuid (It includes functions for converting between standard 36-char
Having both a UNIQUE
For comparision:
Italics Updates/etc were added Sep, 2017; nothing critical changed.
Addressing Ivan's answer and the link he provided.
Your #3 is really bad when the
CHARCTER SET is utf8 -- CHAR(36) occupies 108 bytes! Update: There are ROW_FORMATs for which this will stay 36.UUIDs (GUIDs) are very "random". Using them as either a UNIQUE or a PRIMARY key on large tables is very inefficient. This is because of having to jump around the table/index each time you
INSERT a new UUID or SELECT by UUID. When the table/index is too large to fit in cache (see innodb_buffer_pool_size, which must be smaller than RAM, typically 70%), the 'next' UUID may not be cached, hence a slow disk hit. When the table/index is 20 times as big as the cache, only 1/20th (5%) of hits are cached -- you are I/O-bound. Generalization: The inefficiency applies to any "random" access -- UUID / MD5 / RAND() / etcSo, don't use UUIDs unless either
- you have "small" tables, or
- you really need them because of generating unique ids from different places (and have not figured out another way to do it).
More on UUIDs: http://mysql.rjweb.org/doc.php/uuid (It includes functions for converting between standard 36-char
UUIDs and BINARY(16).) Update: MySQL 8.0 has a builtin function for such.Having both a UNIQUE
AUTO_INCREMENT and a UNIQUE UUID in the same table is a waste.- When an
INSERToccurs, all unique/primary keys must be checked for duplicates.
- Either unique key is sufficient for InnoDB's requirement of having a
PRIMARY KEY.
BINARY(16)(16 bytes) is somewhat bulky (an argument against making it the PK), but not that bad.
- The bulkiness matters when you have secondary keys. InnoDB silently tacks the PK onto the end of each secondary key. The main lesson here is to minimize the number of secondary keys, especially for very large tables. Elaboration: For one secondary key, the bulkiness debate usually ends in a draw. For 2 or more secondary keys, a fatter PK usually leads to a bigger disk footprint for the table including its indexes.
For comparision:
INT UNSIGNED is 4 bytes with range of 0..4 billion. BIGINT is 8 bytes.Italics Updates/etc were added Sep, 2017; nothing critical changed.
Addressing Ivan's answer and the link he provided.
- Yes, there are benefits of a distributed id-generator. But...
- There are other ways to avoid the delay of a centralized ID server -- Dispense 100 consecutive ids at a time to the client. Or use a client-generated concatenation of client_id and time.
- If the entire index is cached in RAM, then much of my argument is moot. But once it spills to disk, performance suffers severely. His article does briefly point out that UUIDv4 is slower than UUIDv1, but fails to point out that the bits need rearranging to achieve the benefit.
- I agree with squeezing a 36-byte UUID into a 16-byte
BINARY. MySQL 8 even has a function to do such. Furthermore, it (and my blog) rearrange the bits so that UUIDv1 has the temporal characteristic of an auto_increment.
- Sharding, I contend, is not normally done against auto_increment ids, so switching to UUIDs does not buy anything.
Context
StackExchange Database Administrators Q#115766, answer score: 21
Revisions (0)
No revisions yet.