HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlModerate

Values larger than 1/3 of a buffer page cannot be indexed

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
cannotlargerthanindexedpagevaluesbuffer

Problem

I am not very good with DB so please bear with me.

I am trying to put a very long JSON data to a table, this table was created by Django framework.

I am using Postgres on Heroku. So, when I try to put the data I get the following error:

File "/app/.heroku/python/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.


My DB and table looks something like this:

```
gollahalli-me-django-test::DATABASE=> \dt
List of relations
Schema | Name | Type | Owner
--------+----------------------------+-------+----------------
public | auth_group | table | ffnyjettujyfck
public | auth_group_permissions | table | ffnyjettujyfck
public | auth_permission | table | ffnyjettujyfck
public | auth_user | table | ffnyjettujyfck
public | auth_user_groups | table | ffnyjettujyfck
public | auth_user_user_permissions | table | ffnyjettujyfck
public | django_admin_log | table | ffnyjettujyfck
public | django_content_type | table | ffnyjettujyfck
public | django_migrations | table | ffnyjettujyfck
public | django_session | table | ffnyjettujyfck
public | editor_contentmodel | table | ffnyjettujyfck
(11 rows)

gollahalli-me-django-test::DATABASE=> \d+ editor_contentmodel
Table "public.editor_contentmodel"
Column | Type | Modifiers | Storage | Stats target | Description
-----------+--------------------------+-----------+----------+--------------+-------------
ref_id | character varying(120) | not null | extended | |
conte

Solution

You have an UNIQUE index on (content, ref_id), called editor_contentmodel_content_2192f49c_uniq

"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)


I'm not sure why this there to begin with. So let's step back and address what this does. This makes sure that content, and ref_id are unique. However, in PostgreSQL the UNIQUE constraint is implemented with a btree which makes this a poor solution. Using this method, you're creating a btree with content which essentially duplicates the size of this small table, and makes for a gigantic index. A gigantic index that is still limited by content size though -- as you've found. It raises a few questions

-
Do you care that content is unique? If you do care that content is unique for ref_id, then what you probably want is to store the hash of the content. Something like..

CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,md5(content));


This will instead store the md5sum of content on the btree. So long as ref_id has content with a unique md5 over that ref_id, you're good.

-
If you don't care that content is unique consider removing it entirely.

It may be worth nothing that when you implement a UNIQUE constraint with a btree (as PostgreSQL does), you get an added index for free. Under a normal circumstance this has a fringe benefit.

CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,content);


Will speed up the query

SELECT *
FROM foo
WHERE ref_id = 5
  AND content = 'This content'


However, when you chance it to use the functional md5() variant there is no longer an index on content, so now to use that index you'll have to

  • Only query on ref_id,



  • Add to ref_id a clause that md5(content) = md5('This content')



The whole text = text is over rated. That's almost never what you want. If you're looking to speed up query-time over text the btree is pretty useless. You likely want to look into

  • pgtrgm



  • text_pattern_ops



  • Full text search (FTS)



UPDATE 1

Base on your JSON I would suggest storing it as a jsonb, and then creating the index on md5(content); so perhaps rather than the above instead run this.

ALTER TABLE public.editor_contentmodel
  ALTER COLUMN content
  SET DATA TYPE jsonb
  USING content::jsonb;

CREATE UNIQUE INDEX ON foo (ref_id,md5(content::text));


UPDATE 2

You ask which indexes you should remove

gollahalli_me_django=> create unique index on editor_contentmodel (ref_id, md5(content::text));
CREATE INDEX
gollahalli_me_django=> \d editor_contentmodel;
        Table "public.editor_contentmodel"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 ref_id    | character varying(120)   | not null
 content   | jsonb                    | not null
 timestamp | timestamp with time zone | not null
Indexes:
    "editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id)
    "editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id) <---- 1
    "editor_contentmodel_ref_id_md5_idx" UNIQUE, btree (ref_id, md5(content::text))
    "editor_contentmodel_ref_id_8f74b4f3_like" btree (ref_id varchar_pattern_ops) <----2


Here is the surprising answer: you should remove all of them except: editor_contentmodel_pkey which says that all ref_id's need to be unique.

  • editor_contentmodel_content_2192f49c_uniq this index makes sure that you're UNIQUE on ref_id AND content, but if you can't have a duplicate ref_id you can never have a duplicate content for that ref_id. So you can never violate this index without also violating editor_contentmodel_pkey. That makes it pointless.



  • editor_contentmodel_ref_id_md5_idx this index is also pointless for the same reason. You can never have a duplicate md5(content::text) over ref_id because regardless of what the value of md5(content::text) is you can never have a duplicate ref_id.



  • editor_contentmodel_ref_id_8f74b4f3_like is also a bad idea because you're duplicating the index over ref_id. This isn't useless, it's just not-optimal. Instead, if you need a varchar_pattern_ops use it instead over just the content field.



As a last note, we don't much use varchar in PostgreSQL because it's implemented as a varlena with a check constraint. There is no gain to it, and there is nothing lost when you simply use text. So unless there is a concrete reason why ref_id can ever be 120 characters but it can be 119 characters, then I would simply use the text type.

UPDATE 3

Let's go back to your earlier problem..

psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"


This is telling you the problem is specifically with index "editor_contentmodel_content_2192f49c_uniq". You've defined that as

```
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (conte

Code Snippets

"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)
CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,md5(content));
CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,content);
SELECT *
FROM foo
WHERE ref_id = 5
  AND content = 'This content'
ALTER TABLE public.editor_contentmodel
  ALTER COLUMN content
  SET DATA TYPE jsonb
  USING content::jsonb;

CREATE UNIQUE INDEX ON foo (ref_id,md5(content::text));

Context

StackExchange Database Administrators Q#162820, answer score: 12

Revisions (0)

No revisions yet.