patternsqlModerate
PostgreSQL filtering on array length inside JSON
Viewed 0 times
postgresqlarraylengthfilteringjsoninside
Problem
I have a table
I created an index on the length of the
but when I filter, I still get a sequential scan when I try to filter on it:
That's about 8 minutes to filter! Did I do something wrong here, or is this the consequence of PostgreSQL not keeping statistics on JSON(B) objects? It should be possible to flatten out this
edit: these array lengths do not vary much. There are only 4 distinct values in the data currently, and I don't expect to have many more. Is the index just not very useful in this case, or can I improve filtering some other way?
table with a JSONB field data, which contains a variable-length array, e.g.{"label": "xyz", "items": [ ... ]}I created an index on the length of the
"items" element:CREATE INDEX n_items ON table ( JSONB_ARRAY_LENGTH(data->'items') )but when I filter, I still get a sequential scan when I try to filter on it:
EXPLAIN ANALYZE SELECT COUNT(*) FROM table WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;
QUERY PLAN
-----------------------------------------------------------------------------------------
Aggregate (cost=2565655.67..2565655.68 rows=1 width=8)
-> Seq Scan on table (cost=0.00..2535256.19 rows=12159794 width=8)
Filter: (jsonb_array_length((table.data -> 'items'::text)) = 2)
Planning time: 0.121 ms
Execution time: 482891.694 msThat's about 8 minutes to filter! Did I do something wrong here, or is this the consequence of PostgreSQL not keeping statistics on JSON(B) objects? It should be possible to flatten out this
data column, but I'd like to be sure that's what I need to do before I start working on it.edit: these array lengths do not vary much. There are only 4 distinct values in the data currently, and I don't expect to have many more. Is the index just not very useful in this case, or can I improve filtering some other way?
Solution
Without knowing your data, I can only guess that the selectivity of your index is low (which happen if the length of the array does not vary much).
One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a
This will hopefully turn into an index-only scan (I tested this omitting the
One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a
NOT NULL column (for example, the primary key of the table) to count, and then include this column in the index:CREATE INDEX n_items ON your_table (jsonb_array_length(data->'items'), id);
SELECT count(id)
FROM your_table
WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;This will hopefully turn into an index-only scan (I tested this omitting the
jsonb part, but you will be able to tell if it works).Code Snippets
CREATE INDEX n_items ON your_table (jsonb_array_length(data->'items'), id);
SELECT count(id)
FROM your_table
WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;Context
StackExchange Database Administrators Q#149832, answer score: 17
Revisions (0)
No revisions yet.