patternsqlModerate
Search for nested values in jsonb array with greater operator
Viewed 0 times
searchoperatorarraywithgreaternestedforvaluesjsonb
Problem
Here is the table definition (simplified):
Sample values:
I want to search all documents where one of the
My query is
But, as I want to limit to 50 documents, I have to group on the
With millions of documents, this query is very expensive - 10 seconds with 1 million.
I try to add a GIN index on the array of the jsonb object. But it seems it's applied only while using a jsonb operator like
Do you have some ideas to have better performance?
CREATE TABLE documents (
document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
data_block jsonb NULL
);Sample values:
INSERT INTO documents (document_id, data_block)
VALUES
(878979,
'{"COMMONS": {"DATE": {"value": "2017-03-11"}},
"PAYABLE_INVOICE_LINES": [
{"AMOUNT": {"value": 52408.53}},
{"AMOUNT": {"value": 654.23}}
]}')
, (977656,
'{"COMMONS": {"DATE": {"value": "2018-03-11"}},
"PAYABLE_INVOICE_LINES": [
{"AMOUNT": {"value": 555.10}}
]}');I want to search all documents where one of the
'PAYABLE_INVOICE_LINES' elements contains a 'value' greater than 1000.00.My query is
select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
where (pil->'AMOUNT'->>'value')::decimal > 1000But, as I want to limit to 50 documents, I have to group on the
document_id and limit the result to 50.With millions of documents, this query is very expensive - 10 seconds with 1 million.
I try to add a GIN index on the array of the jsonb object. But it seems it's applied only while using a jsonb operator like
@>.Do you have some ideas to have better performance?
Solution
This is generally hard to optimize: no direct operator or index support for
Related:
-
Indexed range comparison of array key's value in row inside JSONB
-
Speed up range test for key values nested in jsonb array of objects
To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an
Query (must match index expression):
dbfiddle here
jsonb for this kind of test.EXISTS should at least be faster than what you have, while also avoiding duplicate rows (where multiple array elements match) and the additional (redundant) column pil in the result:SELECT *
FROM documents d
WHERE EXISTS (
SELECT FROM jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') pil
WHERE (pil->'AMOUNT'->>'value')::decimal > 1000
);Related:
-
Indexed range comparison of array key's value in row inside JSONB
-
Speed up range test for key values nested in jsonb array of objects
To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an
IMMUTABLE function in a very small and fast (but also specialized) expression index:CREATE OR REPLACE FUNCTION f_doc_max_amout(jsonb)
RETURNS numeric AS
$func$
SELECT max((a->'AMOUNT'->>'value')::numeric)
FROM jsonb_array_elements($1) a
$func$ LANGUAGE sql IMMUTABLE;
CREATE INDEX documents_max_amount_idx
ON documents (f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES'));Query (must match index expression):
SELECT *
FROM documents d
WHERE f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES') > 1000;dbfiddle here
Code Snippets
SELECT *
FROM documents d
WHERE EXISTS (
SELECT FROM jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') pil
WHERE (pil->'AMOUNT'->>'value')::decimal > 1000
);CREATE OR REPLACE FUNCTION f_doc_max_amout(jsonb)
RETURNS numeric AS
$func$
SELECT max((a->'AMOUNT'->>'value')::numeric)
FROM jsonb_array_elements($1) a
$func$ LANGUAGE sql IMMUTABLE;
CREATE INDEX documents_max_amount_idx
ON documents (f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES'));SELECT *
FROM documents d
WHERE f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES') > 1000;Context
StackExchange Database Administrators Q#202739, answer score: 12
Revisions (0)
No revisions yet.