patternsqlMinor
GIN index ignored when query has many (more than 18) OR statements
Viewed 0 times
ignoredstatementsquerymorethanginhasmanywhenindex
Problem
I have an issue where the execution is not using a GIN index when going over 18 OR statements.
This is my index:
this is the query:
```
explain analyze
select * from "emailjobs"
where "emailjobs"."$$meta.deleted" = false
and ("externalReferences" @> '[{"href":"/responsibilities/pending/06ad562d-379a-4827-a9d3-1de179e29852"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/0e754454-db34-4ec2-8192-898f34928e80"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/0f93b24c-ee8f-4872-896c-9a75fcb904e0"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/10f38c74-6b6f-4298-819f-a2a2e61f23ce"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/127468cf-0bb4-41fe-8d5d-2886a4f0d4a4"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/297aefb9-e4aa-4b9e-9dcd-8687df1b200a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2e5c4b39-4375-49be-9bee-7b8106502c59"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f08f86c-696b-4f71-bbe3-51cf2bc74387"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f443528-b173-4522-bdb2-7c942112d781"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f8e87f6-4573-478d-925d-fca286cc6cee"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/318417fa-9c41-4377-b815-02f40147ce65"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/3b4cd61c-6adc-4b56-8519-5bd2612fb214"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/3e5ed89e-bfa4-47f9-a056-9817f8e91a4a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/404a90d6-a6e8-4068-8d0c-904685705129"}]'
This is my index:
CREATE INDEX emailjobs_externalreferences_gin
ON vsko_mailer_api_prod.emailjobs USING gin
("externalReferences" jsonb_path_ops)
TABLESPACE pg_default;this is the query:
```
explain analyze
select * from "emailjobs"
where "emailjobs"."$$meta.deleted" = false
and ("externalReferences" @> '[{"href":"/responsibilities/pending/06ad562d-379a-4827-a9d3-1de179e29852"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/0e754454-db34-4ec2-8192-898f34928e80"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/0f93b24c-ee8f-4872-896c-9a75fcb904e0"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/10f38c74-6b6f-4298-819f-a2a2e61f23ce"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/127468cf-0bb4-41fe-8d5d-2886a4f0d4a4"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/297aefb9-e4aa-4b9e-9dcd-8687df1b200a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2e5c4b39-4375-49be-9bee-7b8106502c59"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f08f86c-696b-4f71-bbe3-51cf2bc74387"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f443528-b173-4522-bdb2-7c942112d781"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f8e87f6-4573-478d-925d-fca286cc6cee"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/318417fa-9c41-4377-b815-02f40147ce65"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/3b4cd61c-6adc-4b56-8519-5bd2612fb214"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/3e5ed89e-bfa4-47f9-a056-9817f8e91a4a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/404a90d6-a6e8-4068-8d0c-904685705129"}]'
Solution
The problem is that PostgreSQL assumes that more rows will be returned when you specify more
You could create a function that extracts the interesting parts of the JSON:
Then you can index
and query like this:
The “overlaps” operator should do what you want.
OR conditions, so at some point it will think that an index scan won't be faster any more.You could create a function that extracts the interesting parts of the JSON:
CREATE FUNCTION get_array(jsonb) RETURNS text[]
LANGUAGE sql IMMUTABLE AS
$SELECT array_agg(x->>'href') FROM jsonb_array_elements($1) AS v(x)$;Then you can index
CREATE INDEX ON vsko_mailer_api_prod.emailjobs USING gin (get_array("externalReferences"));and query like this:
... WHERE get_array("externalReferences")
&& ARRAY['/responsibilities/pending/06ad562d-379a-4827-a9d3-1de179e29852',
'/responsibilities/pending/0e754454-db34-4ec2-8192-898f34928e80',
...
]The “overlaps” operator should do what you want.
Code Snippets
CREATE FUNCTION get_array(jsonb) RETURNS text[]
LANGUAGE sql IMMUTABLE AS
$$SELECT array_agg(x->>'href') FROM jsonb_array_elements($1) AS v(x)$$;CREATE INDEX ON vsko_mailer_api_prod.emailjobs USING gin (get_array("externalReferences"));... WHERE get_array("externalReferences")
&& ARRAY['/responsibilities/pending/06ad562d-379a-4827-a9d3-1de179e29852',
'/responsibilities/pending/0e754454-db34-4ec2-8192-898f34928e80',
...
]Context
StackExchange Database Administrators Q#246732, answer score: 4
Revisions (0)
No revisions yet.