patternMajor
Quick nearest neighbor search in the 150-dimensional space
Viewed 0 times
neighborthespacesearchnearestquick150dimensional
Problem
I want to create a database using any of the possible RDBMS. It will have a table with approximately 150 columns. The objective is to perform nearest neighbor search of some other objects. So it's a NNS in the 150-dimensional space.
I already tried to use some obvious methods like L1 or L2 distances but of course it takes a lot of time for tables with many rows. Also I tried to look at the KD-tree (note I did not test it) and PG-Strom but they are not a good solution for data with a many dimensions.
Can I somehow improve the speed of the described search using math methods (like KD-tree) or tech methods (like PG-Strom)?
I will try to use any RDBMS which allow to improve speed of the NNS. But MySQL and PostgreSQL are the most appropriate DBMS for me.
I already tried to use some obvious methods like L1 or L2 distances but of course it takes a lot of time for tables with many rows. Also I tried to look at the KD-tree (note I did not test it) and PG-Strom but they are not a good solution for data with a many dimensions.
Can I somehow improve the speed of the described search using math methods (like KD-tree) or tech methods (like PG-Strom)?
I will try to use any RDBMS which allow to improve speed of the NNS. But MySQL and PostgreSQL are the most appropriate DBMS for me.
Solution
PostgreSQL 9.6 using
First install the cube extension
Now we will create some n-dimensional space with 100,000 points in 50 dimensions. In addition we'll add a GIST index.
Now we will generate a single point and use the `
That said there is one caveat,
To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger.
You ask for 150 dimensions. That may present a minor complication.
cubeFirst install the cube extension
CREATE EXTENSION cube;Now we will create some n-dimensional space with 100,000 points in 50 dimensions. In addition we'll add a GIST index.
CREATE TEMP TABLE space_nd
AS
SELECT i, cube(array_agg(random()::float)) AS c
FROM generate_series(1,1e5) AS i
CROSS JOIN LATERAL generate_series(1,50)
AS x
GROUP BY i;
CREATE INDEX ON space_nd USING gist ( c );
ANALYZE space_nd;Now we will generate a single point and use the `
operater to find the nearest point using Eucledian distance.
WITH points AS (
SELECT cube(array_agg(random()::float)) AS c
FROM generate_series(1,50)
AS x
)
SELECT i,
pg_typeof(space_nd.c),
pg_typeof(points.c),
cube_distance(space_nd.c, points.c)
FROM space_nd
CROSS JOIN points
ORDER BY space_nd.c points.c
LIMIT 5;
PostgreSQL 9.6+ supports other distance operators over cube`. All of which can use the GIST index we created. Namely,a b float8 Euclidean distance between a and b.
a b float8 Taxicab (L-1 metric) distance between a and b.
a b float8 Chebyshev (L-inf metric) distance between a and b.That said there is one caveat,
To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger.
You ask for 150 dimensions. That may present a minor complication.
Code Snippets
CREATE EXTENSION cube;CREATE TEMP TABLE space_nd
AS
SELECT i, cube(array_agg(random()::float)) AS c
FROM generate_series(1,1e5) AS i
CROSS JOIN LATERAL generate_series(1,50)
AS x
GROUP BY i;
CREATE INDEX ON space_nd USING gist ( c );
ANALYZE space_nd;WITH points AS (
SELECT cube(array_agg(random()::float)) AS c
FROM generate_series(1,50)
AS x
)
SELECT i,
pg_typeof(space_nd.c),
pg_typeof(points.c),
cube_distance(space_nd.c, points.c)
FROM space_nd
CROSS JOIN points
ORDER BY space_nd.c <-> points.c
LIMIT 5;a <-> b float8 Euclidean distance between a and b.
a <#> b float8 Taxicab (L-1 metric) distance between a and b.
a <=> b float8 Chebyshev (L-inf metric) distance between a and b.Context
StackExchange Database Administrators Q#163207, answer score: 21
Revisions (0)
No revisions yet.