HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Improve query "JOIN ... WHERE NOT EXISTS (SELECT 1 FROM ... WHERE ... )"

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
querywherejoinimproveexistsselectfromnot

Problem

I'm trying to optimize a query because the plan generates a full table scan for a subquery.

Data

The (simplified) table structure:

CREATE TABLE RECORD (
  ID INTEGER PRIMARY KEY AUTO_INCREMENT
);

CREATE TABLE POINT (
  ID INTEGER PRIMARY KEY AUTO_INCREMENT,
  RECORD_ID INTEGER NOT NULL,
  CONSTRAINT POINT_FK1 FOREIGN KEY (RECORD_ID) REFERENCES RECORD(ID)
);

CREATE TABLE SEGMENT (
  ID INTEGER PRIMARY KEY AUTO_INCREMENT,
  POINT_START INTEGER NOT NULL,
  POINT_END INTEGER NOT NULL,
  RECORD_ID INTEGER NOT NULL,
  CONSTRAINT SEGMENT_FK1 FOREIGN KEY (POINT_START) REFERENCES POINT(ID),
  CONSTRAINT SEGMENT_FK2 FOREIGN KEY (POINT_END) REFERENCES POINT(ID),
  CONSTRAINT SEGMENT_FK3 FOREIGN KEY (RECORD_ID) REFERENCES RECORD(ID)
);


The data (in real life the tables have hundreds of thousands of rows):

INSERT INTO RECORD(ID) VALUES (1), (2), (3);
INSERT INTO POINT(ID, RECORD_ID) VALUES (1, 1), (2, 1), (3, 1), (4, 1), (5, 2), (6, 2);
INSERT INTO SEGMENT(ID, POINT_START, POINT_END, RECORD_ID) VALUES (1, 1, 2, 1), (2, 3, 4, 1);


Query

The (simplified) original query:

SELECT
  RECORD.ID AS RECORD_ID,
  POINT.ID AS POINT_ID
FROM
  RECORD
  LEFT OUTER JOIN POINT
    ON RECORD.ID = POINT.RECORD_ID
    AND (NOT EXISTS (SELECT 1 FROM SEGMENT WHERE POINT_START = POINT.ID OR POINT_END = POINT.ID))


Results:

+-----------+----------+
| RECORD_ID | POINT_ID |
+-----------+----------+
|         1 |     NULL |
|         2 |        5 |
|         2 |        6 |
|         3 |     NULL |
+-----------+----------+


This has a query plan with a full table scan for the subquery:

```
+------+--------------------+---------+-------+-------------------------+-----------+---------+----------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+---------+-------+-------------------------+-----------+-

Solution

I tried the LEFT JOIN .. IS NULL idea I noted in the comment:

SELECT
  RECORD.ID AS RECORD_ID,
  POINT.ID AS POINT_ID
FROM
  RECORD
  LEFT OUTER JOIN (
    POINT
    LEFT JOIN SEGMENT s1 ON POINT.ID = s1.POINT_START
    LEFT JOIN SEGMENT s2 ON POINT.ID = s2.POINT_END
  ) ON RECORD.ID = POINT.RECORD_ID
      AND s1.POINT_START IS NULL AND s2.POINT_END IS NULL;


http://sqlfiddle.com/#!9/f6e9ca/1

The dependent subqueries are now joins instead, using the same indexes but it should have less overhead - dependent subquery means firing a subquery per main (outer) select row which is not as well optimized as joins are.

Code Snippets

SELECT
  RECORD.ID AS RECORD_ID,
  POINT.ID AS POINT_ID
FROM
  RECORD
  LEFT OUTER JOIN (
    POINT
    LEFT JOIN SEGMENT s1 ON POINT.ID = s1.POINT_START
    LEFT JOIN SEGMENT s2 ON POINT.ID = s2.POINT_END
  ) ON RECORD.ID = POINT.RECORD_ID
      AND s1.POINT_START IS NULL AND s2.POINT_END IS NULL;

Context

StackExchange Database Administrators Q#137384, answer score: 2

Revisions (0)

No revisions yet.