HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Efficient key value store in Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresefficientvaluestorekey

Problem

I want to use Postgres(11 in CloudSQL) as an efficient key value store. I have about 200GB of dictionaries (average size is 10kB, the structure can be different and nested). I'm thinking about leveraging improved hash indexes. Here is the schema:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

        CREATE TABLE IF NOT EXISTS key_val (
            id uuid DEFAULT uuid_generate_v4(),
            value jsonb,
            EXCLUDE using hash (id with =)
        );

        CREATE INDEX IF NOT EXISTS idx_key_val ON key_val USING hash (id);


Get, Update and Insert are quite simple, but I'm not sure how to implement efficient upsert.

INSERT INTO key_val VALUES ($1, $2) ON CONFLICT ON CONSTRAINT key_val_id_excl DO UPDATE SET value = ($2)


results in WrongObjectTypeError ON CONFLICT DO UPDATE not supported with exclusion constraints

Possible solution:

  • Upsert https://stackoverflow.com/a/8702291/336186



  • Use hstore, but I'd lose json query improvements and a single dictionary can have 20kB+



  • Use btree - Lookups would be slower (estimate is 2x-3x) and the index would be much larger.



  • Another option



EDIT

Info:
Mac 2,3 GHz Intel Core i9; 16GB RAM
(PostgreSQL) 11.4

\d+
public | user_profiles | table | postgres | 16 GB
# num of records
SELECT COUNT(*) FROM user_profiles -> 3 095 348
# hash index
SELECT pg_table_size('idx_user_profiles');
87 334 912
SELECT pg_table_size('idx_user_profiles_btree')
97 705 984


For Btree

```
postgres=# \d user_profiles
Table "public.user_profiles"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+--------------------
key | uuid | | not null | uuid_generate_v4()
value | jsonb | | |
Indexes:
"user_profiles_pkey" PRIMARY KEY, btree (key)

postgres=# SELECT * FROM user_profiles WHERE key = '2cfc4dbf-a1b9-46b3-8c15-a03f51dde890';
Time: 3.126 ms

INSERT INTO user_profiles (value) VALUES ('{"type": "_app_retail","user_id"

Solution

-
Number one: PostgreSQL will never be great as a key-value store if you have many UPDATEs. Workloads with many UPDATEs are just hard for PostgreSQL's architecture.

Make sure that you create your table with a fillfactor way below 100, so that you can make use of HOT updates. This is the only way you can survive workloads with many UPDATEs. Make sure that there is no index on value, otherwise it won't work.

Also, since there will be considerable churn on the TOAST table of that table, make sure to set autovacuum_work_mem high and toast.autovacuum_vacuum_cost_delay = 0 on that table, so that autovacuum has a chance to keep up.

-
Number two: Don't use hash indexes. I don't buy for a second that they are two or three times as fast. You'd have to benchmark this to convince me.

With a B-tree index, your problem would simply go away: just put a primary key constraint on id.

-
Number three: storing the dictionaries as a large jsonb means that you have to store a new copy of the whole whenever you update even a single value of the jsonb.

You might be better off by modeling this in a relational way. This will of course only be a good option if there are no nested structures in the jsonb.

Context

StackExchange Database Administrators Q#255022, answer score: 7

Revisions (0)

No revisions yet.