patternsqlMinor
Why an incorrect JOIN using correlated sub-query is so much slower
Viewed 0 times
whyincorrectmuchqueryslowersubcorrelatedjoinusing
Problem
I'm doing some fairly lightweight data massaging/cleaning and ran into a problem where one version of JOIN using a correlated sub-query (probably an erroneous one) ran much much slower than what I believe is the correct one. I'm not asking how to do the query (I believe I've now got that correct), but I'd like to know why the slow version is so slow.
The Problem
The domain is a fairly simple database to manage a Lottery Syndicate (recording members payments, games played and wins). In moving to a new engine (SQLite) I'm trying to clean the data and improve the tables' structure.
The existing
The main problem is that there is no link (other than the date of the win) to the actual game played. Those records have already been migrated and are now held in an
The three tables
```
_Games Games Events
ID GameType ID GameType ID Name
-- --------- -- --------- -- ----------
1 GameName1 1 GameName1 5 Dispersal
2 GameName2 1 GameName2
The Problem
The domain is a fairly simple database to manage a Lottery Syndicate (recording members payments, games played and wins). In moving to a new engine (SQLite) I'm trying to clean the data and improve the tables' structure.
The existing
_Winnings table records the amounts and dates won and the "game type" (there are multiple games that could be played):CREATE TABLE [_Winnings](
[ID] integer primary key not null,
[WinDate] date,
[Amount] integer,
[GameType] integer references _Games(ID)
);
CREATE INDEX [_WinningsIndex] on _Winnings(GameType) ;The main problem is that there is no link (other than the date of the win) to the actual game played. Those records have already been migrated and are now held in an
EventHistory table:CREATE TABLE [EventHistory](
[ID] integer primary key not null,
[EventType] integer references Events(ID),
[GameType] integer references Games(ID),
[EventDate] date
);
CREATE INDEX [EventHistoryEventIndex] on EventHistory(EventType) ;
CREATE INDEX [EventHistoryGameIndex] on EventHistory(GameType) ;
CREATE INDEX [EventHistoryDateIndex] on EventHistory(EventDate) ;The three tables
_Games, Games and Events hold the "type" of game/event and have essentially the following content:```
_Games Games Events
ID GameType ID GameType ID Name
-- --------- -- --------- -- ----------
1 GameName1 1 GameName1 5 Dispersal
2 GameName2 1 GameName2
Solution
The difference is in how the correlated subquery does the search.
The fast subquery looks like this:
There is an index on
The slow subquery looks like this:
The two values are constant (as far as the subquery is concerned), so either all rows of the table match, or none. But the query optimizer is not smart enough to recognize this, so it goes through all rows of the table and evaluates the WHERE clause each time.
(There is the MIN/MAX optimization, but it works only when there is no WHERE clause.)
That you are inserting new rows into the tables has no effect on the speed. However, SQLite computes result rows on demand, if possible, so modifying tables while you are reading them can make the results inconsistent. You should read all results of the query first, or use a temporary table.
The fast subquery looks like this:
select min(ID)
from EventHistory
where EventHistory.EventDate = ?
-- SEARCH TABLE EventHistory USING COVERING INDEX EventHistoryDateIndex (EventDate=?)There is an index on
EventDate, so the database can look up the matching rows in that index, and then remember and return only the smallest ID value.The slow subquery looks like this:
select min(ID)
from EventHistory
where ? = ?
-- SEARCH TABLE EventHistoryThe two values are constant (as far as the subquery is concerned), so either all rows of the table match, or none. But the query optimizer is not smart enough to recognize this, so it goes through all rows of the table and evaluates the WHERE clause each time.
(There is the MIN/MAX optimization, but it works only when there is no WHERE clause.)
That you are inserting new rows into the tables has no effect on the speed. However, SQLite computes result rows on demand, if possible, so modifying tables while you are reading them can make the results inconsistent. You should read all results of the query first, or use a temporary table.
Code Snippets
select min(ID)
from EventHistory
where EventHistory.EventDate = ?
-- SEARCH TABLE EventHistory USING COVERING INDEX EventHistoryDateIndex (EventDate=?)select min(ID)
from EventHistory
where ? = ?
-- SEARCH TABLE EventHistoryContext
StackExchange Database Administrators Q#231250, answer score: 2
Revisions (0)
No revisions yet.