HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Do long names for jsonb keys use more storage?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
morelongstoragenameskeysforusejsonb

Problem

Consider the example: Two tables foo and bar, each with a jsonb column.

-
For foo, there are a million rows where the value of the jsonb is [{"a":123}].

-
For bar, there are a million rows where the value of the jsonb is [{"very_long_key_not_premature_optimization_at_all":123}]

The json key in bar is 46 characters longer than that in foo. Would the size of bar be 46 million bytes greater than foo?

Solution

Yes key length matters

The bigger the key, the more space to store it. JSONB doesn't do anything special to keys.

Test Cast

Sample Data

# CREATE TABLE foo AS SELECT '{"f":true}'::jsonb FROM generate_series(1,1e6);
SELECT 1000000

# CREATE TABLE bar AS SELECT '{"very_long_key_not_premature_optimization_at_all":true}'::jsonb FROM generate_series(1,1e6);
SELECT 1000000


Table sizes

Now look at the tables

# \dt+ foo;
                   List of relations
 Schema | Name | Type  |  Owner   | Size  | Description 
--------+------+-------+----------+-------+-------------
 public | foo  | table | ecarroll | 42 MB | 
(1 row)

test=# \dt+ bar;
                   List of relations
 Schema | Name | Type  |  Owner   | Size  | Description 
--------+------+-------+----------+-------+-------------
 public | bar  | table | ecarroll | 89 MB | 
(1 row)


Source Code

You can see it in the source here

str = TextDatumGetCString(in_datums[i * 2 + 1]);
len = strlen(str);

v.type = jbvString;

v.val.string.len = len;


ZSON Extension

You may consider checking out ZSON extension which provides JSONB cross-row dictionary compression


ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents (not only keys, but also values, array elements, etc).


In some cases ZSON can save half of your disk space and give you about 10% more TPS. Memory is saved as well. See docs/benchmark.md. Everything depends on your data and workload though. Don't believe any benchmarks, re-check everything on your data, configuration, hardware, workload and PostgreSQL version.

Code Snippets

# CREATE TABLE foo AS SELECT '{"f":true}'::jsonb FROM generate_series(1,1e6);
SELECT 1000000

# CREATE TABLE bar AS SELECT '{"very_long_key_not_premature_optimization_at_all":true}'::jsonb FROM generate_series(1,1e6);
SELECT 1000000
# \dt+ foo;
                   List of relations
 Schema | Name | Type  |  Owner   | Size  | Description 
--------+------+-------+----------+-------+-------------
 public | foo  | table | ecarroll | 42 MB | 
(1 row)

test=# \dt+ bar;
                   List of relations
 Schema | Name | Type  |  Owner   | Size  | Description 
--------+------+-------+----------+-------+-------------
 public | bar  | table | ecarroll | 89 MB | 
(1 row)
str = TextDatumGetCString(in_datums[i * 2 + 1]);
len = strlen(str);

v.type = jbvString;

v.val.string.len = len;

Context

StackExchange Database Administrators Q#161864, answer score: 13

Revisions (0)

No revisions yet.