patternsqlMajor
Why is this sqlite query much slower when I index the columns?
Viewed 0 times
thiswhythemuchcolumnsquerysqliteslowerwhenindex
Problem
I have a sqlite database with two tables, each with 50,000 rows in, containing names of (fake) people. I've constructed a simple query to find out how many names there are (given name, middle initial, surname) that are common to both tables:
When there are no indexes except on the primary keys (irrelevant to this query), it runs quickly:
But if I add indexes to the three columns on each table (six indexes in all):
then it runs painfully slowly:
Is there any rhyme or reason to this?
Here's the result of
This is with indexes:
select count(*) from fakenames_uk inner join fakenames_usa on fakenames_uk.givenname=fakenames_usa.givenname and fakenames_uk.surname=fakenames_usa.surname and fakenames_uk.middleinitial=fakenames_usa.middleinitial;When there are no indexes except on the primary keys (irrelevant to this query), it runs quickly:
[james@marlon Downloads] $ time sqlite3 generic_data_no_indexes.sqlite "select count(*) from fakenames_uk inner join fakenames_usa on fakenames_uk.givenname=fakenames_usa.givenname and fakenames_uk.surname=fakenames_usa.surname and fakenames_uk.middleinitial=fakenames_usa.middleinitial;"
131
real 0m0.115s
user 0m0.111s
sys 0m0.004sBut if I add indexes to the three columns on each table (six indexes in all):
CREATE INDEX `idx_uk_givenname` ON `fakenames_uk` (`givenname` )
//etc.then it runs painfully slowly:
[james@marlon Downloads] $ time sqlite3 generic_data.sqlite "select count(*) from fakenames_uk inner join fakenames_usa on fakenames_uk.givenname=fakenames_usa.givenname and fakenames_uk.surname=fakenames_usa.surname and fakenames_uk.middleinitial=fakenames_usa.middleinitial;"
131
real 1m43.102s
user 0m52.397s
sys 0m50.696sIs there any rhyme or reason to this?
Here's the result of
EXPLAIN QUERY PLAN for the version without indexes:0|0|0|SCAN TABLE fakenames_uk
0|1|1|SEARCH TABLE fakenames_usa USING AUTOMATIC COVERING INDEX (middleinitial=? AND surname=? AND givenname=?)This is with indexes:
0|0|0|SCAN TABLE fakenames_uk
0|1|1|SEARCH TABLE fakenames_usa USING INDEX idx_us_middleinitial (middleinitial=?)Solution
In SQLite, joins are executed as nested loop joins, i.e., the database goes through one table, and for each row, searches matching rows from the other table.
If there is an index, the database can look up any matches in the index quickly, and then go to the corresponding table row to get the values of any other columns that are needed.
In this case, there are three possible indexes. Without any statistical information (which would be created by running ANALYZE), the database chooses the smallest one, to reduce I/O. However, the
If there is no index, the lookup of matching rows would require a complete table scan of the second table for each row of the first table. This would be so bad that the database estimates that it is worthwhile to create and then drop a temporary index just for this query.
This temporary ("AUTOMATIC") index is created on all colunms used for the search. The COUNT(*) operation does not need values from any other columns, so this index happens to be a covering index, which means that it is not necessary to actually look up the table row corresponding to an index entry, which saves even more I/O.
To speed up this query, create this index permanently, so that it is no longer necessary to construct a temporary one:
The index on
The index on
The index on
If there is an index, the database can look up any matches in the index quickly, and then go to the corresponding table row to get the values of any other columns that are needed.
In this case, there are three possible indexes. Without any statistical information (which would be created by running ANALYZE), the database chooses the smallest one, to reduce I/O. However, the
middleinitial index is useless because it does not greatly reduce the number of table rows that need to be fetched; and the additional step through the index actually increases the I/O needed because the table rows are no longer read in order, but randomly.If there is no index, the lookup of matching rows would require a complete table scan of the second table for each row of the first table. This would be so bad that the database estimates that it is worthwhile to create and then drop a temporary index just for this query.
This temporary ("AUTOMATIC") index is created on all colunms used for the search. The COUNT(*) operation does not need values from any other columns, so this index happens to be a covering index, which means that it is not necessary to actually look up the table row corresponding to an index entry, which saves even more I/O.
To speed up this query, create this index permanently, so that it is no longer necessary to construct a temporary one:
CREATE INDEX uk_all_names ON fakenames_uk(surname, givenname, middleinitial);
EXPLAIN QUERY PLAN
SELECT count(*)
FROM fakenames_uk
JOIN fakenames_usa USING (givenname, middleinitial, surname);
0|0|1|SCAN TABLE fakenames_usa
0|1|0|SEARCH TABLE fakenames_uk USING COVERING INDEX uk_all_names (surname=? AND givenname=? AND middleinitial=?)The index on
surname is no longer needed because the three-column index can be used for any lookups on this column.The index on
givenname might be useful if you will do lookups on this column only.The index on
middleinitial is always worthless: a query that searches for one of the 26 possible values is faster if it just scans the entire table.Code Snippets
CREATE INDEX uk_all_names ON fakenames_uk(surname, givenname, middleinitial);
EXPLAIN QUERY PLAN
SELECT count(*)
FROM fakenames_uk
JOIN fakenames_usa USING (givenname, middleinitial, surname);
0|0|1|SCAN TABLE fakenames_usa
0|1|0|SEARCH TABLE fakenames_uk USING COVERING INDEX uk_all_names (surname=? AND givenname=? AND middleinitial=?)Context
StackExchange Database Administrators Q#150858, answer score: 22
Revisions (0)
No revisions yet.