patternsqlMinor
Good layout of 3d point data for spatial queries in Postgres?
Viewed 0 times
postgrespointlayoutforgoodqueriesdataspatial
Problem
Like shown in another question, I deal with a lot (>10,000,000) entries of points in a 3D space. These points are defined like this:
If I am not mistaken 3*8 bytes + 8 bytes padding are needed (
I oftentimes do spatial queries like this:
A query like this takes about 160 ms, but I wonder if this could be reduced.
This is the table layout the structure is used in:
```
Column | Type | Modifiers
---------------+--------------------------+-------------------------------------------------------
id | bigint | not null default nextval('location_id_seq'::regclass)
user_id | integer | not null
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
project_id | integer | not null
location | float3d | not null
editor_id | integer |
parent_id | bigint |
radius | real
CREATE TYPE float3d AS (
x real,
y real,
z real);If I am not mistaken 3*8 bytes + 8 bytes padding are needed (
MAXALIGN is 8) to store one of these points. Is there a better way to store this kind of data? In the afore mentioned question it was stated that composite types involve quite some overhead.I oftentimes do spatial queries like this:
SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z,
t1.confidence, t1.radius, t1.skeleton_id, t1.user_id,
t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z,
t2.confidence, t2.radius, t2.skeleton_id, t2.user_id
FROM treenode t1
INNER JOIN treenode t2 ON
( (t1.id = t2.parent_id OR t1.parent_id = t2.id)
OR (t1.parent_id IS NULL AND t1.id = t2.id))
WHERE (t1.LOCATION).z = 41000.0
AND (t1.LOCATION).x > 2822.6
AND (t1.LOCATION).x 33629.8
AND (t1.LOCATION).y < 65458.6
AND t1.project_id = 1 LIMIT 5000;A query like this takes about 160 ms, but I wonder if this could be reduced.
This is the table layout the structure is used in:
```
Column | Type | Modifiers
---------------+--------------------------+-------------------------------------------------------
id | bigint | not null default nextval('location_id_seq'::regclass)
user_id | integer | not null
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
project_id | integer | not null
location | float3d | not null
editor_id | integer |
parent_id | bigint |
radius | real
Solution
The composite type is clean design, but it does not help performance at all.
First of all,
The
Measure actual sizes with
SQL Fiddle demonstrating actual sizes.
The composite type
If you embed that into a table, padding may have to be added. On the other hand the header of the type can be 3 byte smaller on disk. Representation on disk is typically a bit smaller than in RAM. Doesn't make a lot of difference.
More:
Table layout
Use this equivalent design to reduce row size substantially:
Test before and after to verify my claim:
More details:
First of all,
float translates to float8 a.k.a. double precision in Postgres. You are building on a misunderstanding.The
real data type occupies 4 byte (not 8). It has to be aligned at multiples of 4 bytes.Measure actual sizes with
pg_column_size().SQL Fiddle demonstrating actual sizes.
The composite type
real3d occupies 36 bytes. That's:23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytesIf you embed that into a table, padding may have to be added. On the other hand the header of the type can be 3 byte smaller on disk. Representation on disk is typically a bit smaller than in RAM. Doesn't make a lot of difference.
More:
- Configuring PostgreSQL for read performance
- Calculating and saving space in PostgreSQL
Table layout
Use this equivalent design to reduce row size substantially:
Column | Type | Modifiers
---------------+--------------------------+---------------------------------
id | bigint | not null default nextval(...
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
user_id | integer | not null
project_id | integer | not null
location_x | real | not null
location_y | real | not null
location_z | real | not null
radius | real | not null default 0
skeleton_id | integer | not null
confidence | smallint | not null default 5
parent_id | bigint |
editor_id | integer |Test before and after to verify my claim:
SELECT pg_relation_size('treenode') As table_size;
SELECT avg(pg_column_size(t) AS avg_row_size
FROM treenode t;More details:
- Measure the size of a PostgreSQL table row
Code Snippets
23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytesColumn | Type | Modifiers
---------------+--------------------------+---------------------------------
id | bigint | not null default nextval(...
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
user_id | integer | not null
project_id | integer | not null
location_x | real | not null
location_y | real | not null
location_z | real | not null
radius | real | not null default 0
skeleton_id | integer | not null
confidence | smallint | not null default 5
parent_id | bigint |
editor_id | integer |SELECT pg_relation_size('treenode') As table_size;
SELECT avg(pg_column_size(t) AS avg_row_size
FROM treenode t;Context
StackExchange Database Administrators Q#72787, answer score: 3
Revisions (0)
No revisions yet.