HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Is PostgreSQL JSONB @> operator equal to ->''=?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlequaljsonboperator

Problem

Is the jsonb_column @> '{"key":value}'::jsonb operator equal to jsonb_column->'key' = value? in terms of result, performance, and indexes that will be used?

Solution

I have created a small test for checking this:

CREATE TABLE jsonb_test (id serial, data jsonb);


Then inserted some generated data:

INSERT INTO jsonb_test (data) 
SELECT row_to_json(x.*)::jsonb 
  FROM (SELECT i AS bla, 
               ARRAY[i, i+1, i+2] AS foo, 
               ARRAY[('v' || i::text, i), ('v' || (i * 10)::text, i)] AS bar
          FROM generate_series(1,100000) t(i)
       ) x;


Let's find something in there (version 1):

SELECT * FROM jsonb_test WHERE data @> '{"bla": 545}'::jsonb;


The execution plan looks like

Seq Scan on jsonb_test  (cost=0.00..7604.89 rows=369 width=36) (actual time=0.425..47.600 rows=1 loops=1)
   Filter: (data @> '{"bla": 545}'::jsonb)
   Rows Removed by Filter: 99999
   Buffers: shared hit=2997 dirtied=676
 Planning time: 0.078 ms
 Execution time: 47.692 ms


If I do the same with the other version (version 2) you propose (with the operator changed from -> to ->> as I think you meant that):

SELECT * FROM jsonb_test WHERE data ->> 'bla' = '545';


I get a plan like

Seq Scan on jsonb_test  (cost=0.00..8526.47 rows=1843 width=36) (actual time=0.613..34.657 rows=1 loops=1)
   Filter: ((data ->> 'bla'::text) = '545'::text)
   Rows Removed by Filter: 99999
   Buffers: shared hit=2997
 Planning time: 0.116 ms
 Execution time: 34.727 ms


Running them repeatedly, I see no significant difference between the execution times. It is no wonder - the expensive operation here is the sequential scan, while the two different filter conditions should't be very different in this regard.

Now I add a GIN index on data:

CREATE INDEX ON jsonb_test USING gin (data);


When executing the queries now, version 2 does the same as above, while version 1 has a new plan:

Bitmap Heap Scan on jsonb_test  (cost=28.77..372.05 rows=100 width=36) (actual time=0.081..0.083 rows=1 loops=1)
   Recheck Cond: (data @> '{"bla": 545}'::jsonb)
   Rows Removed by Index Recheck: 2
   Heap Blocks: exact=1
   Buffers: shared hit=10
   ->  Bitmap Index Scan on jsonb_test_data_idx  (cost=0.00..28.75 rows=100 width=0) (actual time=0.062..0.062 rows=3 loops=1)
         Index Cond: (data @> '{"bla": 545}'::jsonb)
         Buffers: shared hit=9
 Planning time: 0.084 ms
 Execution time: 0.122 ms


This already shows that the two versions are not fully eqivalent - a GIN index on the jsonb column supports only one of them (version 1).

One can define an other index to help version 2:

CREATE INDEX ON jsonb_test ((data->>'bla'));


Now version 2 gives the following:

Bitmap Heap Scan on jsonb_test  (cost=12.17..1323.49 rows=500 width=36) (actual time=0.065..0.070 rows=1 loops=1)
   Recheck Cond: ((data ->> 'bla'::text) = '545'::text)
   Heap Blocks: exact=1
   Buffers: shared hit=1 read=2
   ->  Bitmap Index Scan on jsonb_test_expr_idx  (cost=0.00..12.04 rows=500 width=0) (actual time=0.046..0.046 rows=1 loops=1)
         Index Cond: ((data ->> 'bla'::text) = '545'::text)
         Buffers: shared read=2
 Planning time: 0.480 ms
 Execution time: 0.133 ms


In this experiment, I don't see a huge performance difference between the two, given proper indexing. However, you should consider the following:


Insertion into a GIN index can be slow due to the likelihood of many keys being inserted for each item. So, for bulk insertions into a table it is advisable to drop the GIN index and recreate it after finishing bulk insertion.

Usually it is said that BTree indexes are cheaper to maintain (meaning data changes on the table have a smaller overhead) that GIN indexes. From reading around, this is not always the case - you have to test these solutions on your own data set. For example, I cannot tell off the top of my head if a set of very complex jsonb values will prefer this or that.

Notes:

  • to understand the similarities and differences of the json(b) operators, the official documentation is an excellent source.



  • it might make sense reading Bruce Momjian's presentation about indexing techniques.

Code Snippets

CREATE TABLE jsonb_test (id serial, data jsonb);
INSERT INTO jsonb_test (data) 
SELECT row_to_json(x.*)::jsonb 
  FROM (SELECT i AS bla, 
               ARRAY[i, i+1, i+2] AS foo, 
               ARRAY[('v' || i::text, i), ('v' || (i * 10)::text, i)] AS bar
          FROM generate_series(1,100000) t(i)
       ) x;
SELECT * FROM jsonb_test WHERE data @> '{"bla": 545}'::jsonb;
Seq Scan on jsonb_test  (cost=0.00..7604.89 rows=369 width=36) (actual time=0.425..47.600 rows=1 loops=1)
   Filter: (data @> '{"bla": 545}'::jsonb)
   Rows Removed by Filter: 99999
   Buffers: shared hit=2997 dirtied=676
 Planning time: 0.078 ms
 Execution time: 47.692 ms
SELECT * FROM jsonb_test WHERE data ->> 'bla' = '545';

Context

StackExchange Database Administrators Q#88822, answer score: 13

Revisions (0)

No revisions yet.