patternsqlMinor
MySQL Large Table Indexes - Recommended length of index
Viewed 0 times
recommendedlengthindexesmysqllargeindextable
Problem
I am working with a table that has a couple hundred million rows of data. It uses varchar(32) UUID's as it's keys both primary and foreign (there are two). I was wondering with this size of table, how much of each field do I need to index to keep good performance but not index every character of those three fields.
For example if I set an index length of 24 characters, will I see any noticeable difference in performance? I was hoping there might be some kind of algorithm out there that can be used to estimate performance based on length of field and length of index.
Currently the table is ~100GB in size, and the index is taking up over half of that space. I would like to recover some of that space, and reduce the growth rate of the table.
For example if I set an index length of 24 characters, will I see any noticeable difference in performance? I was hoping there might be some kind of algorithm out there that can be used to estimate performance based on length of field and length of index.
Currently the table is ~100GB in size, and the index is taking up over half of that space. I would like to recover some of that space, and reduce the growth rate of the table.
Solution
This is not an answer to your question (how to cut index length), but an alternative to reach the same objective: use less space in your DB.
If you want to optimize storage you should not store UUIDs as
You can store them as
My response is heavily inspired in Storing UUID Values in MySQL Tables, which I would recommend you read.
The following example shows you how to store the same UUID data in the two different formats:
Both
Assuming we have a table called
... and we add also a well known value:
You can then query the table and look for the known value using either of the representations:
You get, in both cases, the same result:
uuid_a | hex(uuid_b)
:------------------------------- | :-------------------------------
aab5d5fd70c111e5a4fbb026b977eb28 | AAB5D5FD70C111E5A4FBB026B977EB28
In practice, you would only use
By doing this, you halve the amount of space used to store your UUIDs (both in table and indexes), without having to resort to using a substring in your index (which can affect your performance, and still not save that much).
See all the code at dbfiddle here
If you want to optimize storage you should not store UUIDs as
varchar(32) (nor character(32)). You can store them as
binary(16) with no loss of information, and save (at least) 50% of space. A UUID contains 128 bits of information. If stored as a text representing the hexadecimal value, you use 4 bits of information out of every character, which employs 8 bits assuming the text is encoded as ASCII or latin1. So, you waste half your available bits. Storing the information in binary format, you waste none.My response is heavily inspired in Storing UUID Values in MySQL Tables, which I would recommend you read.
The following example shows you how to store the same UUID data in the two different formats:
CREATE TABLE t
(
uuid_a character(32) PRIMARY KEY
,uuid_b binary(16) UNIQUE KEY
) ;Both
uuid_a and uuid_b will be used to store the same UUID, encoded in two different ways.Assuming we have a table called
generator_64k with 64k of data... we fill t with lots of uuid() using:-- We fill it with random data
INSERT INTO
t
SELECT
replace(u,'-','') -- text version
,unhex(replace(u,'-','')) -- binary version
FROM
(SELECT
uuid() AS u
FROM
generator_64k
) AS s0 ;... and we add also a well known value:
-- We fill it with one known piece of data
INSERT INTO
t
SELECT
replace(u,'-','') -- text version
,unhex(replace(u,'-','')) -- binary version
FROM
(SELECT
'aab5d5fd-70c1-11e5-a4fb-b026b977eb28' AS u
) AS s0 ;You can then query the table and look for the known value using either of the representations:
-- Encoded as character(32)
SELECT
uuid_a, hex(uuid_b)
FROM
t
WHERE
uuid_a = replace('aab5d5fd-70c1-11e5-a4fb-b026b977eb28', '-', '') ;
-- Encoded as binary(16)
SELECT
uuid_a, hex(uuid_b)
FROM
t
WHERE
uuid_b = unhex(replace('aab5d5fd-70c1-11e5-a4fb-b026b977eb28', '-', '')) ;You get, in both cases, the same result:
uuid_a | hex(uuid_b)
:------------------------------- | :-------------------------------
aab5d5fd70c111e5a4fbb026b977eb28 | AAB5D5FD70C111E5A4FBB026B977EB28
In practice, you would only use
uuid_b in your table.By doing this, you halve the amount of space used to store your UUIDs (both in table and indexes), without having to resort to using a substring in your index (which can affect your performance, and still not save that much).
See all the code at dbfiddle here
Code Snippets
CREATE TABLE t
(
uuid_a character(32) PRIMARY KEY
,uuid_b binary(16) UNIQUE KEY
) ;-- We fill it with random data
INSERT INTO
t
SELECT
replace(u,'-','') -- text version
,unhex(replace(u,'-','')) -- binary version
FROM
(SELECT
uuid() AS u
FROM
generator_64k
) AS s0 ;-- We fill it with one known piece of data
INSERT INTO
t
SELECT
replace(u,'-','') -- text version
,unhex(replace(u,'-','')) -- binary version
FROM
(SELECT
'aab5d5fd-70c1-11e5-a4fb-b026b977eb28' AS u
) AS s0 ;-- Encoded as character(32)
SELECT
uuid_a, hex(uuid_b)
FROM
t
WHERE
uuid_a = replace('aab5d5fd-70c1-11e5-a4fb-b026b977eb28', '-', '') ;
-- Encoded as binary(16)
SELECT
uuid_a, hex(uuid_b)
FROM
t
WHERE
uuid_b = unhex(replace('aab5d5fd-70c1-11e5-a4fb-b026b977eb28', '-', '')) ;Context
StackExchange Database Administrators Q#176318, answer score: 2
Revisions (0)
No revisions yet.