HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why an incorrect JOIN using correlated sub-query is so much slower

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyincorrectmuchqueryslowersubcorrelatedjoinusing

Problem

I'm doing some fairly lightweight data massaging/cleaning and ran into a problem where one version of JOIN using a correlated sub-query (probably an erroneous one) ran much much slower than what I believe is the correct one. I'm not asking how to do the query (I believe I've now got that correct), but I'd like to know why the slow version is so slow.

The Problem

The domain is a fairly simple database to manage a Lottery Syndicate (recording members payments, games played and wins). In moving to a new engine (SQLite) I'm trying to clean the data and improve the tables' structure.

The existing _Winnings table records the amounts and dates won and the "game type" (there are multiple games that could be played):

CREATE TABLE [_Winnings](
    [ID]                integer primary key not null,
    [WinDate]           date,
    [Amount]            integer,
    [GameType]          integer references _Games(ID)
);
CREATE INDEX [_WinningsIndex] on _Winnings(GameType) ;


The main problem is that there is no link (other than the date of the win) to the actual game played. Those records have already been migrated and are now held in an EventHistory table:

CREATE TABLE [EventHistory](
    [ID]                integer primary key not null,
    [EventType]         integer references Events(ID),
    [GameType]          integer references Games(ID),
    [EventDate]         date
);
CREATE INDEX [EventHistoryEventIndex] on EventHistory(EventType) ;
CREATE INDEX [EventHistoryGameIndex]  on EventHistory(GameType) ;
CREATE INDEX [EventHistoryDateIndex]  on EventHistory(EventDate) ;


The three tables _Games, Games and Events hold the "type" of game/event and have essentially the following content:

```
_Games Games Events
ID GameType ID GameType ID Name
-- --------- -- --------- -- ----------
1 GameName1 1 GameName1 5 Dispersal
2 GameName2 1 GameName2

Solution

The difference is in how the correlated subquery does the search.

The fast subquery looks like this:

select min(ID)
from EventHistory
where EventHistory.EventDate = ?

-- SEARCH TABLE EventHistory USING COVERING INDEX EventHistoryDateIndex (EventDate=?)


There is an index on EventDate, so the database can look up the matching rows in that index, and then remember and return only the smallest ID value.

The slow subquery looks like this:

select min(ID)
from EventHistory
where ? = ?

-- SEARCH TABLE EventHistory


The two values are constant (as far as the subquery is concerned), so either all rows of the table match, or none. But the query optimizer is not smart enough to recognize this, so it goes through all rows of the table and evaluates the WHERE clause each time.

(There is the MIN/MAX optimization, but it works only when there is no WHERE clause.)

That you are inserting new rows into the tables has no effect on the speed. However, SQLite computes result rows on demand, if possible, so modifying tables while you are reading them can make the results inconsistent. You should read all results of the query first, or use a temporary table.

Code Snippets

select min(ID)
from EventHistory
where EventHistory.EventDate = ?

-- SEARCH TABLE EventHistory USING COVERING INDEX EventHistoryDateIndex (EventDate=?)
select min(ID)
from EventHistory
where ? = ?

-- SEARCH TABLE EventHistory

Context

StackExchange Database Administrators Q#231250, answer score: 2

Revisions (0)

No revisions yet.