HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Quick search the most similar objects in the n-dimensional space

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thespacesearchobjectsquickdimensionalsimilarmost

Problem

Lets assume that we have a points in the n-dimensional space. So we have a n coords(n columns) which can describe location of the each point.

We need to implement a table which can be used for a quick searching the most similar points, i.e. points which have the smallest distance to the desired point.

E.g. points in the db:

id  c1  c2  c3  c4  c5
1   5   19  42  12  16
2   3   23  38  15  12
3   14  21  32  33  1
4   12  29  21  24  5


If we want to find the best matching for point with coords:

c1 c2  c3  c4  c5
4  20  40  14  15


We will get points with id 1 and 2.

We also have mean coordinate for each dimension(column) and vector for each point in which first element - number of the dimension in which point has the largest difference from the mean coordinate in this dimension, and last - number of the dimension in which point has the smallest difference. Maybe it can be used for the more rapid filtering points which have the biggest distance to the desired point.

So how can I do something like this using MySQL?

I think the composite index and order by abs(cx - $mycx) can be a good solution, but I can't use it because I will have more then 16 columns which I need to include in the one index.

Any help will be very useful!

Solution

PostgreSQL using cube

Using PostgreSQL this becomes pretty easy with the cube extension which I explained on your other question.. Sample data.

CREATE TABLE foo AS
SELECT *
FROM ( VALUES
  (1, cube(ARRAY[5 , 19, 42, 12, 16]) ),
  (2, cube(ARRAY[3 , 23, 38, 15, 12]) ),
  (3, cube(ARRAY[14, 21, 32, 33, 1 ]) ),
  (4, cube(ARRAY[12, 29, 21, 24, 5 ]) )
) AS t(id, c);


Finding distance...

SELECT
  id,
  c,
  c2,
  cube_distance(c,c2),
  rank() OVER (ORDER BY cube_distance(c,c2))
FROM foo
CROSS JOIN ( SELECT cube(ARRAY[4,20,40,14,15]) )
  AS t(c2)
ORDER BY cube_distance(c, c2);


Outputs.

id |          c          |         c2          |  cube_distance   | rank 
----+---------------------+---------------------+------------------+------
  1 | (5, 19, 42, 12, 16) | (4, 20, 40, 14, 15) |  3.3166247903554 |    1
  2 | (3, 23, 38, 15, 12) | (4, 20, 40, 14, 15) | 4.89897948556636 |    2
  4 | (12, 29, 21, 24, 5) | (4, 20, 40, 14, 15) | 26.5706605111728 |    3
  3 | (14, 21, 32, 33, 1) | (4, 20, 40, 14, 15) | 26.8700576850888 |    4

Code Snippets

CREATE TABLE foo AS
SELECT *
FROM ( VALUES
  (1, cube(ARRAY[5 , 19, 42, 12, 16]) ),
  (2, cube(ARRAY[3 , 23, 38, 15, 12]) ),
  (3, cube(ARRAY[14, 21, 32, 33, 1 ]) ),
  (4, cube(ARRAY[12, 29, 21, 24, 5 ]) )
) AS t(id, c);
SELECT
  id,
  c,
  c2,
  cube_distance(c,c2),
  rank() OVER (ORDER BY cube_distance(c,c2))
FROM foo
CROSS JOIN ( SELECT cube(ARRAY[4,20,40,14,15]) )
  AS t(c2)
ORDER BY cube_distance(c, c2);
id |          c          |         c2          |  cube_distance   | rank 
----+---------------------+---------------------+------------------+------
  1 | (5, 19, 42, 12, 16) | (4, 20, 40, 14, 15) |  3.3166247903554 |    1
  2 | (3, 23, 38, 15, 12) | (4, 20, 40, 14, 15) | 4.89897948556636 |    2
  4 | (12, 29, 21, 24, 5) | (4, 20, 40, 14, 15) | 26.5706605111728 |    3
  3 | (14, 21, 32, 33, 1) | (4, 20, 40, 14, 15) | 26.8700576850888 |    4

Context

StackExchange Database Administrators Q#158653, answer score: 3

Revisions (0)

No revisions yet.