HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to get rid of a Bitmap Heap Scan if the table already has the proper indices

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
ridindicesthehowscanheaphasalreadygetbitmap

Problem

I am running PostgreSQL 9.6. These are the relevant definitions:

CREATE TABLE IF NOT EXISTS instagram.profiles_1000 (
    id                          SERIAL PRIMARY KEY,
    username                    VARCHAR(255) NOT NULL UNIQUE,
    followers                   BIGINT,
    tsv                         TSVECTOR
);

CREATE UNIQUE INDEX IF NOT EXISTS instagram_username_index
    ON instagram.profiles_1000(username);
CREATE INDEX IF NOT EXISTS instagram_followers_index
    ON instagram.profiles_1000(followers);
CREATE INDEX IF NOT EXISTS instagram_textsearch_index
    ON instagram.profiles_1000 USING GIN(tsv);


And the text vector is updated by a trigger:

CREATE FUNCTION instagram_documents_search_trigger() RETURNS trigger AS $
begin
  new.tsv :=
        setweight(to_tsvector(COALESCE(new.username, '')), 'D') || ' ' ||
        setweight(to_tsvector(COALESCE(new.full_name, '')), 'C') || ' ' ||
        setweight(to_tsvector(COALESCE(new.location_country, '')), 'B') || ' ' ||
        setweight(to_tsvector(COALESCE(new.location_region, '')), 'B') || ' ' ||
        setweight(to_tsvector(COALESCE(new.biography, '')), 'A') || ' ' ||
        setweight(to_tsvector(COALESCE(new.location_city, '')), 'A');
  return new;
end
$ LANGUAGE plpgsql;

CREATE TRIGGER instagram_tsvectorupdate BEFORE INSERT OR UPDATE
    ON instagram.profiles_1000 FOR EACH ROW
    EXECUTE PROCEDURE instagram_documents_search_trigger();


This is the query:

select instagram.profiles_1000.*, categories, followers as rank                                                                                            
from instagram.profiles_1000
join plainto_tsquery('arts') as q on q @@ tsv
left outer join instagram.profile_categories_agg on instagram.profiles_1000.username = instagram.profile_categories_agg.username
where followers is not null and followers > 0
order by (followers, -id) desc
limit 50;


This is the output of EXPLAIN (ANALYZE, BUFFERS):

https://explain.depesz.com/s/ceC

Solution

The "culprit" here takes up less than 1/4 of the total time. The real bottleneck is Index Only Scan using instagram_categories_username_category_agg, which takes 0.200 * 118453 = 23690.6 ms, almost 24 seconds, which is most of the time.

It looks like every user has exactly one category, unless that is just an amazing coincidence, so why is there a separate table for user category rather than it being an attribute of profiles_1000? That design appears to be the true culprit.

Anyway, the reason it has to do the Bitmap Heap Scan is because that is how it arrives at the correct answer. If it only did the Bitmap Index Scan, it wouldn't have any data about the matching rows, just their address. And it also wouldn't know if that address was really to a matching row, because of visibility, rechecks, and lossy bitmap compression.

Regular index scans also visit both the index and the table, they just don't separate out those operations as two different entries in the EXPLAIN plan the way a bitmap scan does.

Context

StackExchange Database Administrators Q#217263, answer score: 2

Revisions (0)

No revisions yet.