patternsqlMinor
Efficient key value store in Postgres
Viewed 0 times
postgresefficientvaluestorekey
Problem
I want to use Postgres(11 in CloudSQL) as an efficient key value store. I have about 200GB of dictionaries (average size is 10kB, the structure can be different and nested). I'm thinking about leveraging improved hash indexes. Here is the schema:
Get, Update and Insert are quite simple, but I'm not sure how to implement efficient upsert.
results in
Possible solution:
EDIT
Info:
Mac 2,3 GHz Intel Core i9; 16GB RAM
(PostgreSQL) 11.4
For Btree
```
postgres=# \d user_profiles
Table "public.user_profiles"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+--------------------
key | uuid | | not null | uuid_generate_v4()
value | jsonb | | |
Indexes:
"user_profiles_pkey" PRIMARY KEY, btree (key)
postgres=# SELECT * FROM user_profiles WHERE key = '2cfc4dbf-a1b9-46b3-8c15-a03f51dde890';
Time: 3.126 ms
INSERT INTO user_profiles (value) VALUES ('{"type": "_app_retail","user_id"
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE IF NOT EXISTS key_val (
id uuid DEFAULT uuid_generate_v4(),
value jsonb,
EXCLUDE using hash (id with =)
);
CREATE INDEX IF NOT EXISTS idx_key_val ON key_val USING hash (id);Get, Update and Insert are quite simple, but I'm not sure how to implement efficient upsert.
INSERT INTO key_val VALUES ($1, $2) ON CONFLICT ON CONSTRAINT key_val_id_excl DO UPDATE SET value = ($2)results in
WrongObjectTypeError ON CONFLICT DO UPDATE not supported with exclusion constraintsPossible solution:
- Upsert https://stackoverflow.com/a/8702291/336186
- Use hstore, but I'd lose json query improvements and a single dictionary can have 20kB+
- Use btree - Lookups would be slower (estimate is 2x-3x) and the index would be much larger.
- Another option
EDIT
Info:
Mac 2,3 GHz Intel Core i9; 16GB RAM
(PostgreSQL) 11.4
\d+
public | user_profiles | table | postgres | 16 GB
# num of records
SELECT COUNT(*) FROM user_profiles -> 3 095 348
# hash index
SELECT pg_table_size('idx_user_profiles');
87 334 912
SELECT pg_table_size('idx_user_profiles_btree')
97 705 984For Btree
```
postgres=# \d user_profiles
Table "public.user_profiles"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+--------------------
key | uuid | | not null | uuid_generate_v4()
value | jsonb | | |
Indexes:
"user_profiles_pkey" PRIMARY KEY, btree (key)
postgres=# SELECT * FROM user_profiles WHERE key = '2cfc4dbf-a1b9-46b3-8c15-a03f51dde890';
Time: 3.126 ms
INSERT INTO user_profiles (value) VALUES ('{"type": "_app_retail","user_id"
Solution
-
Number one: PostgreSQL will never be great as a key-value store if you have many
Make sure that you create your table with a
Also, since there will be considerable churn on the TOAST table of that table, make sure to set
-
Number two: Don't use hash indexes. I don't buy for a second that they are two or three times as fast. You'd have to benchmark this to convince me.
With a B-tree index, your problem would simply go away: just put a primary key constraint on
-
Number three: storing the dictionaries as a large
You might be better off by modeling this in a relational way. This will of course only be a good option if there are no nested structures in the
Number one: PostgreSQL will never be great as a key-value store if you have many
UPDATEs. Workloads with many UPDATEs are just hard for PostgreSQL's architecture.Make sure that you create your table with a
fillfactor way below 100, so that you can make use of HOT updates. This is the only way you can survive workloads with many UPDATEs. Make sure that there is no index on value, otherwise it won't work.Also, since there will be considerable churn on the TOAST table of that table, make sure to set
autovacuum_work_mem high and toast.autovacuum_vacuum_cost_delay = 0 on that table, so that autovacuum has a chance to keep up.-
Number two: Don't use hash indexes. I don't buy for a second that they are two or three times as fast. You'd have to benchmark this to convince me.
With a B-tree index, your problem would simply go away: just put a primary key constraint on
id.-
Number three: storing the dictionaries as a large
jsonb means that you have to store a new copy of the whole whenever you update even a single value of the jsonb.You might be better off by modeling this in a relational way. This will of course only be a good option if there are no nested structures in the
jsonb.Context
StackExchange Database Administrators Q#255022, answer score: 7
Revisions (0)
No revisions yet.