debugsqlModerate
Values larger than 1/3 of a buffer page cannot be indexed
Viewed 0 times
cannotlargerthanindexedpagevaluesbuffer
Problem
I am not very good with DB so please bear with me.
I am trying to put a very long JSON data to a table, this table was created by Django framework.
I am using Postgres on Heroku. So, when I try to put the data I get the following error:
My DB and table looks something like this:
```
gollahalli-me-django-test::DATABASE=> \dt
List of relations
Schema | Name | Type | Owner
--------+----------------------------+-------+----------------
public | auth_group | table | ffnyjettujyfck
public | auth_group_permissions | table | ffnyjettujyfck
public | auth_permission | table | ffnyjettujyfck
public | auth_user | table | ffnyjettujyfck
public | auth_user_groups | table | ffnyjettujyfck
public | auth_user_user_permissions | table | ffnyjettujyfck
public | django_admin_log | table | ffnyjettujyfck
public | django_content_type | table | ffnyjettujyfck
public | django_migrations | table | ffnyjettujyfck
public | django_session | table | ffnyjettujyfck
public | editor_contentmodel | table | ffnyjettujyfck
(11 rows)
gollahalli-me-django-test::DATABASE=> \d+ editor_contentmodel
Table "public.editor_contentmodel"
Column | Type | Modifiers | Storage | Stats target | Description
-----------+--------------------------+-----------+----------+--------------+-------------
ref_id | character varying(120) | not null | extended | |
conte
I am trying to put a very long JSON data to a table, this table was created by Django framework.
I am using Postgres on Heroku. So, when I try to put the data I get the following error:
File "/app/.heroku/python/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.My DB and table looks something like this:
```
gollahalli-me-django-test::DATABASE=> \dt
List of relations
Schema | Name | Type | Owner
--------+----------------------------+-------+----------------
public | auth_group | table | ffnyjettujyfck
public | auth_group_permissions | table | ffnyjettujyfck
public | auth_permission | table | ffnyjettujyfck
public | auth_user | table | ffnyjettujyfck
public | auth_user_groups | table | ffnyjettujyfck
public | auth_user_user_permissions | table | ffnyjettujyfck
public | django_admin_log | table | ffnyjettujyfck
public | django_content_type | table | ffnyjettujyfck
public | django_migrations | table | ffnyjettujyfck
public | django_session | table | ffnyjettujyfck
public | editor_contentmodel | table | ffnyjettujyfck
(11 rows)
gollahalli-me-django-test::DATABASE=> \d+ editor_contentmodel
Table "public.editor_contentmodel"
Column | Type | Modifiers | Storage | Stats target | Description
-----------+--------------------------+-----------+----------+--------------+-------------
ref_id | character varying(120) | not null | extended | |
conte
Solution
You have an UNIQUE index on
I'm not sure why this there to begin with. So let's step back and address what this does. This makes sure that
-
Do you care that content is unique? If you do care that content is unique for ref_id, then what you probably want is to store the hash of the content. Something like..
This will instead store the md5sum of content on the btree. So long as ref_id has content with a unique md5 over that ref_id, you're good.
-
If you don't care that
It may be worth nothing that when you implement a
Will speed up the query
However, when you chance it to use the functional
The whole
UPDATE 1
Base on your JSON I would suggest storing it as a
UPDATE 2
You ask which indexes you should remove
Here is the surprising answer: you should remove all of them except:
As a last note, we don't much use
UPDATE 3
Let's go back to your earlier problem..
This is telling you the problem is specifically with index
```
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (conte
(content, ref_id), called editor_contentmodel_content_2192f49c_uniq"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)I'm not sure why this there to begin with. So let's step back and address what this does. This makes sure that
content, and ref_id are unique. However, in PostgreSQL the UNIQUE constraint is implemented with a btree which makes this a poor solution. Using this method, you're creating a btree with content which essentially duplicates the size of this small table, and makes for a gigantic index. A gigantic index that is still limited by content size though -- as you've found. It raises a few questions-
Do you care that content is unique? If you do care that content is unique for ref_id, then what you probably want is to store the hash of the content. Something like..
CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,md5(content));This will instead store the md5sum of content on the btree. So long as ref_id has content with a unique md5 over that ref_id, you're good.
-
If you don't care that
content is unique consider removing it entirely.It may be worth nothing that when you implement a
UNIQUE constraint with a btree (as PostgreSQL does), you get an added index for free. Under a normal circumstance this has a fringe benefit.CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,content);Will speed up the query
SELECT *
FROM foo
WHERE ref_id = 5
AND content = 'This content'However, when you chance it to use the functional
md5() variant there is no longer an index on content, so now to use that index you'll have to- Only query on ref_id,
- Add to ref_id a clause that
md5(content) = md5('This content')
The whole
text = text is over rated. That's almost never what you want. If you're looking to speed up query-time over text the btree is pretty useless. You likely want to look into - pgtrgm
- text_pattern_ops
- Full text search (FTS)
UPDATE 1
Base on your JSON I would suggest storing it as a
jsonb, and then creating the index on md5(content); so perhaps rather than the above instead run this.ALTER TABLE public.editor_contentmodel
ALTER COLUMN content
SET DATA TYPE jsonb
USING content::jsonb;
CREATE UNIQUE INDEX ON foo (ref_id,md5(content::text));UPDATE 2
You ask which indexes you should remove
gollahalli_me_django=> create unique index on editor_contentmodel (ref_id, md5(content::text));
CREATE INDEX
gollahalli_me_django=> \d editor_contentmodel;
Table "public.editor_contentmodel"
Column | Type | Modifiers
-----------+--------------------------+-----------
ref_id | character varying(120) | not null
content | jsonb | not null
timestamp | timestamp with time zone | not null
Indexes:
"editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id)
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id) <---- 1
"editor_contentmodel_ref_id_md5_idx" UNIQUE, btree (ref_id, md5(content::text))
"editor_contentmodel_ref_id_8f74b4f3_like" btree (ref_id varchar_pattern_ops) <----2Here is the surprising answer: you should remove all of them except:
editor_contentmodel_pkey which says that all ref_id's need to be unique.editor_contentmodel_content_2192f49c_uniqthis index makes sure that you'reUNIQUEonref_idANDcontent, but if you can't have a duplicateref_idyou can never have a duplicate content for thatref_id. So you can never violate this index without also violatingeditor_contentmodel_pkey. That makes it pointless.
editor_contentmodel_ref_id_md5_idxthis index is also pointless for the same reason. You can never have a duplicatemd5(content::text)overref_idbecause regardless of what the value ofmd5(content::text)is you can never have a duplicateref_id.
editor_contentmodel_ref_id_8f74b4f3_likeis also a bad idea because you're duplicating the index overref_id. This isn't useless, it's just not-optimal. Instead, if you need avarchar_pattern_opsuse it instead over just thecontentfield.
As a last note, we don't much use
varchar in PostgreSQL because it's implemented as a varlena with a check constraint. There is no gain to it, and there is nothing lost when you simply use text. So unless there is a concrete reason why ref_id can ever be 120 characters but it can be 119 characters, then I would simply use the text type.UPDATE 3
Let's go back to your earlier problem..
psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"This is telling you the problem is specifically with index
"editor_contentmodel_content_2192f49c_uniq". You've defined that as ```
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (conte
Code Snippets
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,md5(content));CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,content);SELECT *
FROM foo
WHERE ref_id = 5
AND content = 'This content'ALTER TABLE public.editor_contentmodel
ALTER COLUMN content
SET DATA TYPE jsonb
USING content::jsonb;
CREATE UNIQUE INDEX ON foo (ref_id,md5(content::text));Context
StackExchange Database Administrators Q#162820, answer score: 12
Revisions (0)
No revisions yet.