patternsqlMinor
Understanding composite BTREE + GIN_TRGM_OPS index prioritization & odd lower() behavior
Viewed 0 times
understandingprioritizationlowerbehaviorgin_trgm_opsbtreecompositeindexodd
Problem
hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple
I've read that order doesn't matter in GIN indexes, so I guess my first question is whether or not it is also possible to make one index including multiple columns that would kick in for any combination of their usage? My guess is no, since the indexes definitel
contains type lookups on various user-data columns (~varchar CREATE TABLE people (
id SERIAL,
email character varying(255) not null,
first_name character varying(255) not null,
last_name character varying(255) not null,
user_category integer not null,
user_type character varying(255) not null
);
-- Dummy Data
INSERT INTO people (email, first_name, last_name, user_category, user_type)
SELECT
concat(md5(random()::text), '@not-real-email-plz.com'),
md5(random()::text),
md5(random()::text),
ceil(random() * 3),
('{Grunt,Peon,Ogre}'::text[])[ceil(random()*3)]
FROM
(SELECT * FROM generate_series(1,1000000) AS id) AS x;
-- Standard, existing lookup
CREATE INDEX index_people_category_type ON people USING btree (user_category, user_type);
-- taken from https://niallburkley.com/blog/index-columns-for-like-in-postgres/
CREATE INDEX idx_people_gin_user_category_and_user_type_and_full_name
ON people
USING GIN(user_category, user_type, (first_name || ' ' || last_name) gin_trgm_ops);
-- first name
CREATE INDEX idx_people_gin_user_category_and_user_type_and_first_name
ON people
USING GIN(user_category, user_type, first_name gin_trgm_ops);
-- last name
CREATE INDEX idx_people_gin_user_category_and_user_type_and_last_name
ON people
USING GIN(user_category, user_type, last_name gin_trgm_ops);
-- email
CREATE INDEX idx_people_gin_user_category_and_user_type_and_email
ON people
USING GIN(user_category, user_type, email gin_trgm_ops);
-- non-composite email (had for testing and raised more questions)
CREATE INDEX idx_people_gin_email
ON people
USING GIN(email gin_trgm_ops);
I've read that order doesn't matter in GIN indexes, so I guess my first question is whether or not it is also possible to make one index including multiple columns that would kick in for any combination of their usage? My guess is no, since the indexes definitel
Solution
Is the original b-tree totally redundant at this point? I expected it might still be picked by the planner if only those two columns were used if the b-tree was faster for those data types but that seems like it is not the case.
Not totally. The btree index can be used for ordering (although for 3 distinct values in each column, its not clear how much call there would be for that). In my hands the b-tree index actually is faster, but not by much. I expected it to be faster by more. Even if I change the != test to = (which is where b-tree shines), the b-tree index is still only slightly faster. The advantage btree has in multicolumn equality testing was mostly cancelled by the GIN's advantage in compressing the storage of tid-lists, which comes in handy when each of 3 values shows up 333,333 times. Don't expect this to carry over to other situations.
Results are way faster, and using the expected index. What is it about the lower() call that seems to make the planner skip a beat?
You have to build the index on the things you want to search. If you had built the index on
The cost seems negligibly lower, but wondering what this means for a fairly dynamic amount of columns that could be filtered by? Would it be ideal to make a non-tuple index for all fields too?
I don't know what you mean by a non-tuple index. When you make an N-column GIN index, it is about the same thing as making N single-column GIN indexes. The planner can combine individual indexes with BitmapAnd and BitmapOr, and GIN can combine multiple columns of a muli-column index within in much the same way, but without the transparency.
Not totally. The btree index can be used for ordering (although for 3 distinct values in each column, its not clear how much call there would be for that). In my hands the b-tree index actually is faster, but not by much. I expected it to be faster by more. Even if I change the != test to = (which is where b-tree shines), the b-tree index is still only slightly faster. The advantage btree has in multicolumn equality testing was mostly cancelled by the GIN's advantage in compressing the storage of tid-lists, which comes in handy when each of 3 values shows up 333,333 times. Don't expect this to carry over to other situations.
Results are way faster, and using the expected index. What is it about the lower() call that seems to make the planner skip a beat?
You have to build the index on the things you want to search. If you had built the index on
(user_category, user_type, lower(last_name) gin_trgm_ops); then it would use it. PostgreSQL just knows that lower() takes text and spits out text. It doesn't know that lower(a) LIKE lower(b) implies that a ILIKE b.The cost seems negligibly lower, but wondering what this means for a fairly dynamic amount of columns that could be filtered by? Would it be ideal to make a non-tuple index for all fields too?
I don't know what you mean by a non-tuple index. When you make an N-column GIN index, it is about the same thing as making N single-column GIN indexes. The planner can combine individual indexes with BitmapAnd and BitmapOr, and GIN can combine multiple columns of a muli-column index within in much the same way, but without the transparency.
Context
StackExchange Database Administrators Q#254400, answer score: 2
Revisions (0)
No revisions yet.