patternsqlMinor
How does postgres store array values?
Viewed 0 times
postgresarraystoredoeshowvalues
Problem
Postgres seems to do some sort of compression on array values. I have a table with 2.1 million rows, each has 2 columns of smallint arrays, all populated with 1440 values. most of these values are -32768. I was very surprised to see that the entire table, about 6 billion smallints, was only 540MB on disk. I did an experiment and it looks like Postgres is doing some kind of compression.
With 1000 smallints from 1 to 1000, pg_column_size returns 2048.
An array of 1000 smallints, all -32768, pg_column_size returns 74.
An array of 1000 smallints, alternating 1 & 0, pg_column_size returns 73.
Is this documented anywhere?
```
CREATE TABLE testing (
values smallint[]
);
SELECT pg_table_size('testing');
SELECT pg_column_size(testing.*) FROM testing
-- 1,2,3,...,1000
INSERT INTO testing VALUES(
'{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,3
With 1000 smallints from 1 to 1000, pg_column_size returns 2048.
An array of 1000 smallints, all -32768, pg_column_size returns 74.
An array of 1000 smallints, alternating 1 & 0, pg_column_size returns 73.
Is this documented anywhere?
```
CREATE TABLE testing (
values smallint[]
);
SELECT pg_table_size('testing');
SELECT pg_column_size(testing.*) FROM testing
-- 1,2,3,...,1000
INSERT INTO testing VALUES(
'{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,3
Solution
Internal representation of larger attributes will be sometimes compressed. More specifically, what works here is the TOAST (Oversized Attribute Storage component used in PostgreSQL). The threshold when values are considered for compression is 2000 bytes.
PostgreSQL stores array values in a custom, internal, binary format. Command line example below. Details also here.
answer written by @filiprem acting to extend basic info provided by @a-horse-with-no-name
pg_column_size() is not a logical length, but the size (in bytes) of actual internal representation of the column/variable. It is documented.PostgreSQL stores array values in a custom, internal, binary format. Command line example below. Details also here.
filip=# CREATE TABLE a(x text, a text[][]);
CREATE TABLE
filip=# insert into a select 'MARK', '{{ENE,DUE},{LIKE,FAKE}}';
INSERT 0 1
filip=# insert into a select 'MARK', '{{ENE,DUE},{LIKE,FAKE}}';
INSERT 0 1
filip=# checkpoint ;
CHECKPOINT
filip=# SELECT pg_relation_filepath('a');
pg_relation_filepath
----------------------
base/16385/16576
(1 row)
filip@szary:~$ sudo hexdump -C ~postgres/9.5/main/base/16385/16576
00000000 00 00 00 00 f0 99 b6 02 00 00 00 00 20 00 40 1f |............ .@.|
00000010 00 20 04 20 00 00 00 00 a0 9f b4 00 40 9f b4 00 |. . ........@...|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f40 c8 07 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001f50 02 00 02 00 02 08 18 00 0b 4d 41 52 4b 7b 02 00 |.........MARK{..|
00001f60 00 00 00 00 00 00 19 00 00 00 02 00 00 00 02 00 |................|
00001f70 00 00 01 00 00 00 01 00 00 00 1c 00 00 00 45 4e |..............EN|
00001f80 45 00 1c 00 00 00 44 55 45 00 20 00 00 00 4c 49 |E.....DUE. ...LI|
00001f90 4b 45 20 00 00 00 46 41 4b 45 00 00 00 00 00 00 |KE ...FAKE......|
00001fa0 c7 07 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001fb0 01 00 02 00 02 08 18 00 0b 4d 41 52 4b 7b 02 00 |.........MARK{..|
00001fc0 00 00 00 00 00 00 19 00 00 00 02 00 00 00 02 00 |................|
00001fd0 00 00 01 00 00 00 01 00 00 00 1c 00 00 00 45 4e |..............EN|
00001fe0 45 00 1c 00 00 00 44 55 45 00 20 00 00 00 4c 49 |E.....DUE. ...LI|
00001ff0 4b 45 20 00 00 00 46 41 4b 45 00 00 00 00 00 00 |KE ...FAKE......|
00002000answer written by @filiprem acting to extend basic info provided by @a-horse-with-no-name
Code Snippets
filip=# CREATE TABLE a(x text, a text[][]);
CREATE TABLE
filip=# insert into a select 'MARK', '{{ENE,DUE},{LIKE,FAKE}}';
INSERT 0 1
filip=# insert into a select 'MARK', '{{ENE,DUE},{LIKE,FAKE}}';
INSERT 0 1
filip=# checkpoint ;
CHECKPOINT
filip=# SELECT pg_relation_filepath('a');
pg_relation_filepath
----------------------
base/16385/16576
(1 row)
filip@szary:~$ sudo hexdump -C ~postgres/9.5/main/base/16385/16576
00000000 00 00 00 00 f0 99 b6 02 00 00 00 00 20 00 40 1f |............ .@.|
00000010 00 20 04 20 00 00 00 00 a0 9f b4 00 40 9f b4 00 |. . ........@...|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f40 c8 07 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001f50 02 00 02 00 02 08 18 00 0b 4d 41 52 4b 7b 02 00 |.........MARK{..|
00001f60 00 00 00 00 00 00 19 00 00 00 02 00 00 00 02 00 |................|
00001f70 00 00 01 00 00 00 01 00 00 00 1c 00 00 00 45 4e |..............EN|
00001f80 45 00 1c 00 00 00 44 55 45 00 20 00 00 00 4c 49 |E.....DUE. ...LI|
00001f90 4b 45 20 00 00 00 46 41 4b 45 00 00 00 00 00 00 |KE ...FAKE......|
00001fa0 c7 07 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001fb0 01 00 02 00 02 08 18 00 0b 4d 41 52 4b 7b 02 00 |.........MARK{..|
00001fc0 00 00 00 00 00 00 19 00 00 00 02 00 00 00 02 00 |................|
00001fd0 00 00 01 00 00 00 01 00 00 00 1c 00 00 00 45 4e |..............EN|
00001fe0 45 00 1c 00 00 00 44 55 45 00 20 00 00 00 4c 49 |E.....DUE. ...LI|
00001ff0 4b 45 20 00 00 00 46 41 4b 45 00 00 00 00 00 00 |KE ...FAKE......|
00002000Context
StackExchange Database Administrators Q#163659, answer score: 2
Revisions (0)
No revisions yet.