patternsqlMinor
Inner join using an array column
Viewed 0 times
columnarrayjoinusinginner
Problem
Having trouble indexing and executing a query in
The query includes an
Using PostgreSQL. This is what the table looks like. As you can see, the column 'Names' is a list property and it is the column that the
This is the query that we are trying to do:
My background is from using App Engine's Big Table, so I've used equality operators here to indicate that
Is this possible to do in PostgreSQL?
If so, how should the index be set up (we can't figure out how to set up an index that takes into account the three properties)?
O (log n) time.The query includes an
INNER JOIN, an ORDER BY, and an equality operation. If I understand the laws of databases correctly, a query can be indexed and executed in O (log n) time (or thereabouts) if a non-equality operator is not used on more than one column. In this case, I believe the INNER JOIN does count as an equality operator and the non-equality operator would be the ORDER BY portion of the query. This table has upwards of 10,000,000 rows and needs to handle several reads and writes per second.Using PostgreSQL. This is what the table looks like. As you can see, the column 'Names' is a list property and it is the column that the
INNER JOIN goes against:Age Names Date
34 ['carla', 'john', 'sam'] 3/13/2011
26 ['json', 'cindy', 'joel'] 3/13/2011
72 ['beth', 'amber', 'susie'] 3/13/2011
14 ['john', 'jim', 'debie'] 3/13/2011This is the query that we are trying to do:
SELECT * FROM the_table WHERE Age==26 AND Names=='john' ORDER BY DateMy background is from using App Engine's Big Table, so I've used equality operators here to indicate that
'john' should be one of the names in the Names column. This would be an acceptable query in GAE's big table, it would execute in O (log N) time as all Big Table queries are reqyured to do. I am assuming there is a way to do this in PostgreSQL as well since PostgreSQL accepts list data types as columns.Is this possible to do in PostgreSQL?
If so, how should the index be set up (we can't figure out how to set up an index that takes into account the three properties)?
Solution
This is possible for PostgreSQL 8.4 or later. In order to create a multi-column GIN index, you need to install the additional module btree_gin.
Simple types like
In PostgreSQL 9.1 or later you can install it with
In older versions you would run as privileged system user (like postgres):
For a PostgreSQL 8.4 installation on Debian,this might be:
Then, given this table:
... you can create this multi-column GIN index:
... which can be used by a query like:
Note that a GIN index caries a non-trivial cost for write operations.
But that's hardly worth mentioning in comparison to what a
Simple types like
integer or date are usually better off with a (default) B-Tree index, so that's not installed with standard PostgreSQL. But for a case like this a multi-column index is fastest, so you need the additional index methods for the plain types.In PostgreSQL 9.1 or later you can install it with
CREATE EXTENSION:CREATE EXTENSION btree_gin;In older versions you would run as privileged system user (like postgres):
psql -d dbname -f SHAREDIR/contrib/btree_gin.sql
For a PostgreSQL 8.4 installation on Debian,this might be:
psql -d mydb -f /usr/share/postgresql/8.4/contrib/btree_gin.sql
Then, given this table:
CREATE TEMP TABLE tbl (age int, names text[], thedate date);... you can create this multi-column GIN index:
CREATE INDEX tbl_gin_idx ON tbl USING GIN (names, age, thedate);... which can be used by a query like:
SELECT * FROM tbl
WHERE age = 26
AND '{json}'::text[] <@ names
ORDER BY thedate;Note that a GIN index caries a non-trivial cost for write operations.
But that's hardly worth mentioning in comparison to what a
SELECT on 10 million rows without index would do to you.Code Snippets
CREATE EXTENSION btree_gin;CREATE TEMP TABLE tbl (age int, names text[], thedate date);CREATE INDEX tbl_gin_idx ON tbl USING GIN (names, age, thedate);SELECT * FROM tbl
WHERE age = 26
AND '{json}'::text[] <@ names
ORDER BY thedate;Context
StackExchange Database Administrators Q#39082, answer score: 7
Revisions (0)
No revisions yet.