HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Good layout of 3d point data for spatial queries in Postgres?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgrespointlayoutforgoodqueriesdataspatial

Problem

Like shown in another question, I deal with a lot (>10,000,000) entries of points in a 3D space. These points are defined like this:

CREATE TYPE float3d AS (
  x real,
  y real,
  z real);


If I am not mistaken 3*8 bytes + 8 bytes padding are needed (MAXALIGN is 8) to store one of these points. Is there a better way to store this kind of data? In the afore mentioned question it was stated that composite types involve quite some overhead.

I oftentimes do spatial queries like this:

SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z,
         t1.confidence, t1.radius, t1.skeleton_id, t1.user_id,
         t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z,
         t2.confidence, t2.radius, t2.skeleton_id, t2.user_id
  FROM treenode t1
       INNER JOIN treenode t2 ON
         (   (t1.id = t2.parent_id OR t1.parent_id = t2.id)
          OR (t1.parent_id IS NULL AND t1.id = t2.id))
        WHERE (t1.LOCATION).z = 41000.0
          AND (t1.LOCATION).x > 2822.6
          AND (t1.LOCATION).x  33629.8
          AND (t1.LOCATION).y < 65458.6
          AND t1.project_id = 1 LIMIT 5000;


A query like this takes about 160 ms, but I wonder if this could be reduced.

This is the table layout the structure is used in:

```
Column | Type | Modifiers
---------------+--------------------------+-------------------------------------------------------
id | bigint | not null default nextval('location_id_seq'::regclass)
user_id | integer | not null
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
project_id | integer | not null
location | float3d | not null
editor_id | integer |
parent_id | bigint |
radius | real

Solution

The composite type is clean design, but it does not help performance at all.

First of all, float translates to float8 a.k.a. double precision in Postgres. You are building on a misunderstanding.

The real data type occupies 4 byte (not 8). It has to be aligned at multiples of 4 bytes.

Measure actual sizes with pg_column_size().

SQL Fiddle demonstrating actual sizes.

The composite type real3d occupies 36 bytes. That's:

23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytes


If you embed that into a table, padding may have to be added. On the other hand the header of the type can be 3 byte smaller on disk. Representation on disk is typically a bit smaller than in RAM. Doesn't make a lot of difference.

More:

  • Configuring PostgreSQL for read performance



  • Calculating and saving space in PostgreSQL



Table layout

Use this equivalent design to reduce row size substantially:

Column     |           Type           |                       Modifiers
---------------+--------------------------+---------------------------------
 id            | bigint                   | not null default nextval(...
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 user_id       | integer                  | not null
 project_id    | integer                  | not null
 location_x    | real                     | not null
 location_y    | real                     | not null
 location_z    | real                     | not null
 radius        | real                     | not null default 0
 skeleton_id   | integer                  | not null
 confidence    | smallint                 | not null default 5
 parent_id     | bigint                   |
 editor_id     | integer                  |


Test before and after to verify my claim:

SELECT pg_relation_size('treenode') As table_size;

SELECT avg(pg_column_size(t) AS avg_row_size
FROM   treenode t;


More details:

  • Measure the size of a PostgreSQL table row

Code Snippets

23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytes
Column     |           Type           |                       Modifiers
---------------+--------------------------+---------------------------------
 id            | bigint                   | not null default nextval(...
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 user_id       | integer                  | not null
 project_id    | integer                  | not null
 location_x    | real                     | not null
 location_y    | real                     | not null
 location_z    | real                     | not null
 radius        | real                     | not null default 0
 skeleton_id   | integer                  | not null
 confidence    | smallint                 | not null default 5
 parent_id     | bigint                   |
 editor_id     | integer                  |
SELECT pg_relation_size('treenode') As table_size;

SELECT avg(pg_column_size(t) AS avg_row_size
FROM   treenode t;

Context

StackExchange Database Administrators Q#72787, answer score: 3

Revisions (0)

No revisions yet.