HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Non-integer primary key considerations

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
considerationsprimarynonintegerkey

Problem

Context

I'm designing a database (on PostgreSQL 9.6) which will store data from a distributed application. Due to the application's distributed nature, I can not use auto-increment integers (SERIAL) as my primary key because of potential race-conditions.

The natural solution is to use an UUID, or a globally unique identifier. Postgres comes with a built-in UUID type, which is a perfect fit.

The problem I have with UUID is related to debugging: it's a non-human-friendly string. The identifier ff53e96d-5fd7-4450-bc99-111b91875ec5 tells me nothing, whereas ACC-f8kJd9xKCd, while not guaranteed to be unique, tells me I'm dealing with an ACC object.

From a programming perspective, it is common to debug application queries relating several different objects. Suppose the programmer wrongly searches for an ACC (account) object at the ORD (order) table. With a human-readable identifier, the programmer instantly identifies the problem, while using UUIDs he would spend some time figuring out what was wrong.

I do not need the "guaranteed" uniqueness of UUIDs; I do need some room for generating keys without conflicts, but UUID is overkill. Also, worst case scenario, it wouldn't be the end of the world if a collision happened (the database rejects it and the application can recover). So, trade-offs considered, a smaller but human-friendly identifier would be the ideal solution for my use case.

Identifying application objects

The identifier I came up with has the following format: {domain}-{string}, where {domain} is replaced with the object domain (account, order, product) and {string} is a randomly generated string. In some cases, it might even make sense to insert a {sub-domain} before the random string. Let's ignore the length of {domain} and {string} for the purpose of guaranteeing uniqueness.

The format can have a fixed size if it helps the indexing/querying performance.

The problem

Knowing that:

  • I want to have primary keys with a forma

Solution

Using ltree

If IPV6 works, great. It doesn't support "ACC". ltree does.


A label path is a sequence of zero or more labels separated by dots, for example L1.L2.L3, representing a path from the root of a hierarchical tree to a particular node. The length of a label path must be less than 65kB, but keeping it under 2kB is preferable. In practice this is not a major limitation; for example, the longest label path in the DMOZ catalog (http://www.dmoz.org) is about 240 bytes.

You'd use it like this,

CREATE EXTENSION ltree;
SELECT replace('ACC-f8kJd9xKCd', '-', '.')::ltree;


We create sample data.

SELECT x, (
  CASE WHEN x%7=0 THEN 'ACC'
    WHEN x%3=0 THEN 'XYZ'
    ELSE 'COM'
  END ||'.'|| md5(x::text)
  )::ltree
FROM generate_series(1,10000) AS t(x);

CREATE INDEX ON foo USING GIST (ltree);
ANALYZE foo;

  x  |                ltree                 
-----+--------------------------------------
   1 | COM.c4ca4238a0b923820dcc509a6f75849b
   2 | COM.c81e728d9d4c2f636f067f89cc14862c
   3 | XYZ.eccbc87e4b5ce2fe28308fd9f2a7baf3
   4 | COM.a87ff679a2f3e71d9181a67b7542122c
   5 | COM.e4da3b7fbbce2345d7772b0674a318d5
   6 | XYZ.1679091c5a880faf6fb5e6087eb1b2dc
   7 | ACC.8f14e45fceea167a5a36dedd4bea2543
   8 | COM.c9f0f895fb98ab9159f51fd0297e236d


And viola..

QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=103.23..234.91 rows=1414 width=57) (actual time=0.422..0.908 rows=1428 loops=1)
   Recheck Cond: ('ACC'::ltree @> ltree)
   Heap Blocks: exact=114
   ->  Bitmap Index Scan on foo_ltree_idx  (cost=0.00..102.88 rows=1414 width=0) (actual time=0.389..0.389 rows=1428 loops=1)
         Index Cond: ('ACC'::ltree @> ltree)
 Planning time: 0.133 ms
 Execution time: 1.033 ms
(7 rows)


See the docs for more info and operators

If you're creating the product ids, I would ltree. If you need something to create them, I would use UUID.

Code Snippets

CREATE EXTENSION ltree;
SELECT replace('ACC-f8kJd9xKCd', '-', '.')::ltree;
SELECT x, (
  CASE WHEN x%7=0 THEN 'ACC'
    WHEN x%3=0 THEN 'XYZ'
    ELSE 'COM'
  END ||'.'|| md5(x::text)
  )::ltree
FROM generate_series(1,10000) AS t(x);

CREATE INDEX ON foo USING GIST (ltree);
ANALYZE foo;


  x  |                ltree                 
-----+--------------------------------------
   1 | COM.c4ca4238a0b923820dcc509a6f75849b
   2 | COM.c81e728d9d4c2f636f067f89cc14862c
   3 | XYZ.eccbc87e4b5ce2fe28308fd9f2a7baf3
   4 | COM.a87ff679a2f3e71d9181a67b7542122c
   5 | COM.e4da3b7fbbce2345d7772b0674a318d5
   6 | XYZ.1679091c5a880faf6fb5e6087eb1b2dc
   7 | ACC.8f14e45fceea167a5a36dedd4bea2543
   8 | COM.c9f0f895fb98ab9159f51fd0297e236d
QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=103.23..234.91 rows=1414 width=57) (actual time=0.422..0.908 rows=1428 loops=1)
   Recheck Cond: ('ACC'::ltree @> ltree)
   Heap Blocks: exact=114
   ->  Bitmap Index Scan on foo_ltree_idx  (cost=0.00..102.88 rows=1414 width=0) (actual time=0.389..0.389 rows=1428 loops=1)
         Index Cond: ('ACC'::ltree @> ltree)
 Planning time: 0.133 ms
 Execution time: 1.033 ms
(7 rows)

Context

StackExchange Database Administrators Q#151253, answer score: 5

Revisions (0)

No revisions yet.