snippetsqlMinor
Large hexadecimal PRIMARY KEY - how to index substrings of the PK field?
Viewed 0 times
substringsthehexadecimalprimaryfieldlargehowindexkey
Problem
I have a requirement for a very large (in terms of the number of rows, not data) database, and need to use a very long, unique, hexadecimal
As an example, lets say that the
So, apart from the usual searching by the
What is the best way to store such a primary key (assume that I am using MariaDB or MySQL)?
My thoughts are to do one of the following:
-
one big column,
-
have 8 columns and make a composite key,
-
or something else.
From a general database point of view, what would be the best technical solution to server my requirements?
Apart from the purely technical aspects of the question above, I would also be interested in any recommendations that users here might have as to the best database engine to use in this case?
I will only have a couple more tables, that will have
PRIMARY KEY instead of the usual AUTO_INCREMENT integer.As an example, lets say that the
PK is similar to an IPv6 address, say something like 2001:0db8:0000:0000:0000:8a2e:0370:7334.So, apart from the usual searching by the
PK, there will be a need to search by subfields of it - i.e. find all keys, that begin with "2001" (covered by the PK- should be sargeable), or (say) has 0db8 in their second set of four hex characters, and then to reference this 2nd key from other tables.What is the best way to store such a primary key (assume that I am using MariaDB or MySQL)?
My thoughts are to do one of the following:
-
one big column,
-
have 8 columns and make a composite key,
-
or something else.
From a general database point of view, what would be the best technical solution to server my requirements?
Apart from the purely technical aspects of the question above, I would also be interested in any recommendations that users here might have as to the best database engine to use in this case?
I will only have a couple more tables, that will have
FOREIGN KEY references to the long hex PRIMARY KEY. I can do without the DRI (Declarative Referential Integrity) if there are major performance benefitsSolution
Regular expressions can be a very powerful tool for various data storage, manipulation and indexing requirements - regexes' capabilities in conjunction with
In this particular case (rowing back from my initial answer!), MySQL or its derivatives are the only feasible solution. PostgreSQL doesn't allow
A relatively simple solution to your issue exists if (and maybe it's a big if), you have
You can use
-
PERSISTENT (a.k.a. STORED): This type's value [i.e. data] is actually stored in the table.
-
VIRTUAL: This type's value is not stored at all. Instead, the value [data] is generated dynamically when the table is queried. This type is the default.
(*) There are other terms for these columns, e.g.
MySQL's syntax is (versions 5.7 and up):
In order to use such columns as
"Simple" solution:
Your
(*) Not hex characters in this case - I messed around with MySQL's HEX() and UNHEX() functions to no avail. However, see discussion!
MySQL has an IS_IPV6() function, however, the MySQL 5.7 documentation also contains this gem:
The CHECK clause is parsed but ignored by all storage engines.
So, I'm going to use MySQL 8 which has implemented them - if you want valid IP_V6 addresses in a version prior to 8, then you'll have to use a trigger. MariaDB has
You mentioned that IPv6 was (only) an example - if any set of characters will do, then just remove the
I created this MySQL 8 table (see fiddle here):
I inserted a few records and then:
Result:
So far, so good - we have our tokens split out as required.
Now, we check the
So, I tested the
Complex solution (modelled on the above):
This part of the answer makes extensive use of regular expressions. I'm taking IPv6 as an exemplar of the power and flexibility of regexes - ultimately, you should be able to write your own in line with your particular r
GENERATED columns can be even more flexible and powerful and provide a decent solution to the OP's issue.In this particular case (rowing back from my initial answer!), MySQL or its derivatives are the only feasible solution. PostgreSQL doesn't allow
GENERATED fields to be FOREGIN KEYs unless they're UNIQUE. Obviously, the requirement to be able to use small substrings as FKs can't be satisfied in this case, despite the fact that MySQL's regex implementation leaves a lot to be desired! PG could use TRIGGERs - but that's another question!A relatively simple solution to your issue exists if (and maybe it's a big if), you have
PRIMARY KEY string separated by the same character. A more complex solution is possible using regular expressions - both options are explored below.You can use
GENERATED(*) columns (confusingly, these are also referred to as VIRTUAL in the Wikipedia entry, which is also one of the storage classes for such fields - the other storage classes are [STORED | PERSISTENT]. From the MariaDB documentation:-
PERSISTENT (a.k.a. STORED): This type's value [i.e. data] is actually stored in the table.
-
VIRTUAL: This type's value is not stored at all. Instead, the value [data] is generated dynamically when the table is queried. This type is the default.
(*) There are other terms for these columns, e.g.
COMPUTED BY or CALCULATED.MySQL's syntax is (versions 5.7 and up):
col_name data_type [GENERATED ALWAYS] AS (expr)
[VIRTUAL | STORED] -- storage class
[NOT NULL | NULL]
[UNIQUE [KEY]] [[PRIMARY] KEY]
[COMMENT 'string']In order to use such columns as
FOREIGN KEYs, their storage class has to be STORED."Simple" solution:
Your
PK is of the format (a fiddle with the code and tests for this part is available here):str:str:str:str:str:str:str:str (str = 1-4 characters(*) - separated by colons)(*) Not hex characters in this case - I messed around with MySQL's HEX() and UNHEX() functions to no avail. However, see discussion!
MySQL has an IS_IPV6() function, however, the MySQL 5.7 documentation also contains this gem:
- CHECK
The CHECK clause is parsed but ignored by all storage engines.
So, I'm going to use MySQL 8 which has implemented them - if you want valid IP_V6 addresses in a version prior to 8, then you'll have to use a trigger. MariaDB has
CHECK constraints from at least 10.3 onwards.You mentioned that IPv6 was (only) an example - if any set of characters will do, then just remove the
CHECK constraint - and see the complex solution below for a discussion on regular expressions.I created this MySQL 8 table (see fiddle here):
CREATE TABLE test
(
ip_v6 VARCHAR (39) NOT NULL,
host VARCHAR (100) NOT NULL,
ip_v6_token_1 VARCHAR (4) GENERATED ALWAYS AS
((SUBSTRING_INDEX(SUBSTRING_INDEX(ip_v6, ':', 1), ':', -1))) STORED,
ip_v6_token_2 VARCHAR (4) GENERATED ALWAYS AS
((SUBSTRING_INDEX(SUBSTRING_INDEX(ip_v6, ':', 2), ':', -1))) STORED,
INDEX (ip_v6_token_1), -- required for FOREIGN KEYs
INDEX (ip_v6_token_2), -- "
CONSTRAINT test_ip_v6_pk PRIMARY KEY (ip_v6),
CONSTRAINT ip_v6_valid_ck CHECK (IS_IPV6(ip_v6))
);
-- do the same for tokens 3 - 7 as per requirements!I inserted a few records and then:
SELECT
ip_v6_token_1 AS "Token 1",
ip_v6_token_2 AS "Token 2",
ip_v6 AS "IPv6:",
host AS "Host:"
FROM test ORDER BY host;Result:
Token 1 Token 2 IPv6: Host:
2001 0db8 2001:0db8:0000:FFF0:0000:8a2e:0370:7334 host_1
10 9 10:9:8:7:6:6:7:6 host_2
4001 1111 4001:1111:2222:FFF0:0000:8a2e:0370:7334 host_3So far, so good - we have our tokens split out as required.
Now, we check the
FOREIGN KEY support - create a table:CREATE TABLE test_fk
(
t_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
--
-- Other fields
--
token_1 VARCHAR (4) NOT NULL,
token_2 VARCHAR (4),
CONSTRAINT test_fk_test_ip_v6_token_1_fk
FOREIGN KEY (token_1)
REFERENCES test (ip_v6_token_1) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT test_fk_test_ip_v6_token_2_fk
FOREIGN KEY (token_2)
REFERENCES test (ip_v6_token_2) ON DELETE CASCADE ON UPDATE CASCADE
);So, I tested the
ON DELETE CASCADE and the ON UPDATE CASCADE functionality and they appear to work. Also, when one tries to INSERT on test_fk it fails if there isn't a corresponding token in the ip_v6 field of the test parent table - see the fiddle for the tests.Complex solution (modelled on the above):
This part of the answer makes extensive use of regular expressions. I'm taking IPv6 as an exemplar of the power and flexibility of regexes - ultimately, you should be able to write your own in line with your particular r
Code Snippets
col_name data_type [GENERATED ALWAYS] AS (expr)
[VIRTUAL | STORED] -- storage class
[NOT NULL | NULL]
[UNIQUE [KEY]] [[PRIMARY] KEY]
[COMMENT 'string']CREATE TABLE test
(
ip_v6 VARCHAR (39) NOT NULL,
host VARCHAR (100) NOT NULL,
ip_v6_token_1 VARCHAR (4) GENERATED ALWAYS AS
((SUBSTRING_INDEX(SUBSTRING_INDEX(ip_v6, ':', 1), ':', -1))) STORED,
ip_v6_token_2 VARCHAR (4) GENERATED ALWAYS AS
((SUBSTRING_INDEX(SUBSTRING_INDEX(ip_v6, ':', 2), ':', -1))) STORED,
INDEX (ip_v6_token_1), -- required for FOREIGN KEYs
INDEX (ip_v6_token_2), -- "
CONSTRAINT test_ip_v6_pk PRIMARY KEY (ip_v6),
CONSTRAINT ip_v6_valid_ck CHECK (IS_IPV6(ip_v6))
);
-- do the same for tokens 3 - 7 as per requirements!SELECT
ip_v6_token_1 AS "Token 1",
ip_v6_token_2 AS "Token 2",
ip_v6 AS "IPv6:",
host AS "Host:"
FROM test ORDER BY host;Token 1 Token 2 IPv6: Host:
2001 0db8 2001:0db8:0000:FFF0:0000:8a2e:0370:7334 host_1
10 9 10:9:8:7:6:6:7:6 host_2
4001 1111 4001:1111:2222:FFF0:0000:8a2e:0370:7334 host_3CREATE TABLE test_fk
(
t_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
--
-- Other fields
--
token_1 VARCHAR (4) NOT NULL,
token_2 VARCHAR (4),
CONSTRAINT test_fk_test_ip_v6_token_1_fk
FOREIGN KEY (token_1)
REFERENCES test (ip_v6_token_1) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT test_fk_test_ip_v6_token_2_fk
FOREIGN KEY (token_2)
REFERENCES test (ip_v6_token_2) ON DELETE CASCADE ON UPDATE CASCADE
);Context
StackExchange Database Administrators Q#283535, answer score: 3
Revisions (0)
No revisions yet.