patternsqlMinor
Non-integer primary key considerations
Viewed 0 times
considerationsprimarynonintegerkey
Problem
Context
I'm designing a database (on PostgreSQL 9.6) which will store data from a distributed application. Due to the application's distributed nature, I can not use auto-increment integers (
The natural solution is to use an UUID, or a globally unique identifier. Postgres comes with a built-in
The problem I have with UUID is related to debugging: it's a non-human-friendly string. The identifier
From a programming perspective, it is common to debug application queries relating several different objects. Suppose the programmer wrongly searches for an
I do not need the "guaranteed" uniqueness of UUIDs; I do need some room for generating keys without conflicts, but UUID is overkill. Also, worst case scenario, it wouldn't be the end of the world if a collision happened (the database rejects it and the application can recover). So, trade-offs considered, a smaller but human-friendly identifier would be the ideal solution for my use case.
Identifying application objects
The identifier I came up with has the following format:
The format can have a fixed size if it helps the indexing/querying performance.
The problem
Knowing that:
I'm designing a database (on PostgreSQL 9.6) which will store data from a distributed application. Due to the application's distributed nature, I can not use auto-increment integers (
SERIAL) as my primary key because of potential race-conditions.The natural solution is to use an UUID, or a globally unique identifier. Postgres comes with a built-in
UUID type, which is a perfect fit.The problem I have with UUID is related to debugging: it's a non-human-friendly string. The identifier
ff53e96d-5fd7-4450-bc99-111b91875ec5 tells me nothing, whereas ACC-f8kJd9xKCd, while not guaranteed to be unique, tells me I'm dealing with an ACC object.From a programming perspective, it is common to debug application queries relating several different objects. Suppose the programmer wrongly searches for an
ACC (account) object at the ORD (order) table. With a human-readable identifier, the programmer instantly identifies the problem, while using UUIDs he would spend some time figuring out what was wrong.I do not need the "guaranteed" uniqueness of UUIDs; I do need some room for generating keys without conflicts, but UUID is overkill. Also, worst case scenario, it wouldn't be the end of the world if a collision happened (the database rejects it and the application can recover). So, trade-offs considered, a smaller but human-friendly identifier would be the ideal solution for my use case.
Identifying application objects
The identifier I came up with has the following format:
{domain}-{string}, where {domain} is replaced with the object domain (account, order, product) and {string} is a randomly generated string. In some cases, it might even make sense to insert a {sub-domain} before the random string. Let's ignore the length of {domain} and {string} for the purpose of guaranteeing uniqueness.The format can have a fixed size if it helps the indexing/querying performance.
The problem
Knowing that:
- I want to have primary keys with a forma
Solution
Using
If IPV6 works, great. It doesn't support "ACC".
A label path is a sequence of zero or more labels separated by dots, for example L1.L2.L3, representing a path from the root of a hierarchical tree to a particular node. The length of a label path must be less than 65kB, but keeping it under 2kB is preferable. In practice this is not a major limitation; for example, the longest label path in the DMOZ catalog (http://www.dmoz.org) is about 240 bytes.
You'd use it like this,
We create sample data.
And viola..
See the docs for more info and operators
If you're creating the product ids, I would ltree. If you need something to create them, I would use UUID.
ltreeIf IPV6 works, great. It doesn't support "ACC".
ltree does.A label path is a sequence of zero or more labels separated by dots, for example L1.L2.L3, representing a path from the root of a hierarchical tree to a particular node. The length of a label path must be less than 65kB, but keeping it under 2kB is preferable. In practice this is not a major limitation; for example, the longest label path in the DMOZ catalog (http://www.dmoz.org) is about 240 bytes.
You'd use it like this,
CREATE EXTENSION ltree;
SELECT replace('ACC-f8kJd9xKCd', '-', '.')::ltree;We create sample data.
SELECT x, (
CASE WHEN x%7=0 THEN 'ACC'
WHEN x%3=0 THEN 'XYZ'
ELSE 'COM'
END ||'.'|| md5(x::text)
)::ltree
FROM generate_series(1,10000) AS t(x);
CREATE INDEX ON foo USING GIST (ltree);
ANALYZE foo;
x | ltree
-----+--------------------------------------
1 | COM.c4ca4238a0b923820dcc509a6f75849b
2 | COM.c81e728d9d4c2f636f067f89cc14862c
3 | XYZ.eccbc87e4b5ce2fe28308fd9f2a7baf3
4 | COM.a87ff679a2f3e71d9181a67b7542122c
5 | COM.e4da3b7fbbce2345d7772b0674a318d5
6 | XYZ.1679091c5a880faf6fb5e6087eb1b2dc
7 | ACC.8f14e45fceea167a5a36dedd4bea2543
8 | COM.c9f0f895fb98ab9159f51fd0297e236dAnd viola..
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=103.23..234.91 rows=1414 width=57) (actual time=0.422..0.908 rows=1428 loops=1)
Recheck Cond: ('ACC'::ltree @> ltree)
Heap Blocks: exact=114
-> Bitmap Index Scan on foo_ltree_idx (cost=0.00..102.88 rows=1414 width=0) (actual time=0.389..0.389 rows=1428 loops=1)
Index Cond: ('ACC'::ltree @> ltree)
Planning time: 0.133 ms
Execution time: 1.033 ms
(7 rows)See the docs for more info and operators
If you're creating the product ids, I would ltree. If you need something to create them, I would use UUID.
Code Snippets
CREATE EXTENSION ltree;
SELECT replace('ACC-f8kJd9xKCd', '-', '.')::ltree;SELECT x, (
CASE WHEN x%7=0 THEN 'ACC'
WHEN x%3=0 THEN 'XYZ'
ELSE 'COM'
END ||'.'|| md5(x::text)
)::ltree
FROM generate_series(1,10000) AS t(x);
CREATE INDEX ON foo USING GIST (ltree);
ANALYZE foo;
x | ltree
-----+--------------------------------------
1 | COM.c4ca4238a0b923820dcc509a6f75849b
2 | COM.c81e728d9d4c2f636f067f89cc14862c
3 | XYZ.eccbc87e4b5ce2fe28308fd9f2a7baf3
4 | COM.a87ff679a2f3e71d9181a67b7542122c
5 | COM.e4da3b7fbbce2345d7772b0674a318d5
6 | XYZ.1679091c5a880faf6fb5e6087eb1b2dc
7 | ACC.8f14e45fceea167a5a36dedd4bea2543
8 | COM.c9f0f895fb98ab9159f51fd0297e236dQUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=103.23..234.91 rows=1414 width=57) (actual time=0.422..0.908 rows=1428 loops=1)
Recheck Cond: ('ACC'::ltree @> ltree)
Heap Blocks: exact=114
-> Bitmap Index Scan on foo_ltree_idx (cost=0.00..102.88 rows=1414 width=0) (actual time=0.389..0.389 rows=1428 loops=1)
Index Cond: ('ACC'::ltree @> ltree)
Planning time: 0.133 ms
Execution time: 1.033 ms
(7 rows)Context
StackExchange Database Administrators Q#151253, answer score: 5
Revisions (0)
No revisions yet.