HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Help optimizing this query with multiple where exists

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thiswithquerywherehelpoptimizingmultipleexists

Problem

In the application I'm building, the user is able to define 'types' where each 'type' has a set of 'attributes'.

The user is able to create instances of products by defining a value for each attribute the product's type has.

A pic of the schema:

I'm creating the query where the user specifies the attributes values and the product type and with that I should return all the product id's that meets the query.

The problem I see in my query is that I'm performing a whole select * from attributes_products ... for each attribute that the product's type has.

Is there a way to optimize this?
If I create an index in the column attributes_products.product_id would this query be actually optimal?

Example of a query where I'm looking for a product whose type has 3 attributes:

select p.id
from Products as p
where 
exists(
    select * from attributes_products
    where
        product_id = p.id AND
        attribute_id = 27 AND
        value = 'some_value'
) AND
exists(
    select * from attributes_products
    where
        product_id = p.id AND
        attribute_id = 28 AND
        value = 'other_value'
) AND
exists(
    select * from attributes_products
    where
        product_id = p.id AND
        attribute_id = 29 AND
        value = 'oother_value'
)


Many thanks.

Conclusions

So, Gareth Rees (selected answer) proposed another solution which involves multiple Joins.
Here is the explanation of its query (done by PGAdmin):

This is the explanation of the original query:

I believe that the selected answer is slightly faster, but consumes a lot more memory (because of the triple join).
I believe that my original query is slightly slower (very slightly, since there's an index on the attributes_products table) but a lot more efficient in memory.

Solution

SQL allows you to join the same table multiple times, so what you need here is:

SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
  ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
  ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
  ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'


Here's the toy MySQL database that I'm using to answer this question:

CREATE TABLE products (
    id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
    product_id INTEGER NOT NULL,
    attribute_id INTEGER NOT NULL,
    value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);

INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');


With my query above, MySQL reports the following query plan:

+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys           | key          | key_len | ref                 | rows | Extra                    |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
|  1 | SIMPLE      | ap1   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | ap2   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | ap3   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY                 | PRIMARY      | 4       | temp.ap3.product_id |    1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+


See the MySQL documentation for an explanation of the EXPLAIN output.

This looks better than the plan for the OP's query:

+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type        | table               | type  | possible_keys           | key          | key_len | ref   | rows | Extra                    |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
|  1 | PRIMARY            | p                   | index | NULL                    | PRIMARY      | 4       | NULL  |    2 | Using where; Using index |
|  4 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
|  3 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
|  2 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+


But results will vary from one database to another: a good query planner might be able to make something efficient out of the OP's query.

Code Snippets

SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
  ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
  ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
  ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'
CREATE TABLE products (
    id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
    product_id INTEGER NOT NULL,
    attribute_id INTEGER NOT NULL,
    value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);

INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys           | key          | key_len | ref                 | rows | Extra                    |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
|  1 | SIMPLE      | ap1   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | ap2   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | ap3   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY                 | PRIMARY      | 4       | temp.ap3.product_id |    1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type        | table               | type  | possible_keys           | key          | key_len | ref   | rows | Extra                    |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
|  1 | PRIMARY            | p                   | index | NULL                    | PRIMARY      | 4       | NULL  |    2 | Using where; Using index |
|  4 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
|  3 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
|  2 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+

Context

StackExchange Code Review Q#33772, answer score: 5

Revisions (0)

No revisions yet.