patternsqlMinor
MySQL Query Optimization : Indexing and Pagination
Viewed 0 times
paginationindexingqueryoptimizationmysqland
Problem
I'm adding pagination to my 'latest news' php script and running in to an issue.
I have indexes on approved, time columns.
The problem is when I do an EXPLAIN on the query, it shows it is using the 'approved' index - but it still has to scan all 1000 rows to get the 10 new entries I want to display.
Anyway to optimze the query so that doesn't happen?
SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limitI have indexes on approved, time columns.
The problem is when I do an EXPLAIN on the query, it shows it is using the 'approved' index - but it still has to scan all 1000 rows to get the 10 new entries I want to display.
Anyway to optimze the query so that doesn't happen?
Solution
At this point, you will have to create an index on three columns
You must then refactor the query just slightly to reduce two things the MySQL Query Optimizer is expected to handle:
Here is your original query:
What you can do is gather all the keys first:
This will perform a full index scan rather than a full table scan because all three columns are in the index. That's a lot less baggage to walk around with for temp table generation because you are scanning just 3 columns in a smaller resource (the index) as opposed to 6 columns as from a bigger resource (the table) as well as not sticking url and bodytext (which are probably VARCHAR(300) and TEXT fields) into the mix just for the sake of gathering keys.
Next, make the SELECT sid query into an inline table and connect them back to the news table using only the fetched keys. EXAMPLE : Suppose your LIMIT variables are 200,10. This means you want to move to the 201st row of the news table and get 10 keys from that point. This means that no matter which page you are on, you want to collect 10 keys at a time, and only 10 keys at a time.
Here is the new and improved query:
The only real cost here is doing a full index scan for keys rather than a full table scan. The benefit is that once you get the 10 keys from the inline SELECT, then only 10 titles, bodytexts, authors and urls are retrieved.
Give it a Try !!!
CAVEAT
As a rule of thumb, anytime you index a column whose cardinality is very low (i.e., Male/Female (2), Single/Married/Divorced/Widowed (4), True/False (2), 0/1 (2)) and the number of rows for any one value exceeds 5% of the total rows in the table, the MySQL Query Optimizer will rule out any and all indexes and you will perform can full table scan or a bad index gets chosen and you end up with a full index scan. This is why it is very imperative that you find the right column distribution, or at the very least perform index scans instead of table scans.
UPDATE 2011-08-08 11:13 EDT
What was I thinking ? This was my original propsed index:
The field should be approved not authored. This is what it should be:
The original query you gave was
The index I am proposing (approved_time_sid_ndx) will include approved, time, and sid.
The answer that you just submitted has this query :
This being the case, the index you need now should be this:
Both approved and sid should be together. IF they are not together, the MySQL Query Optimizer may decide to perform an internal index merge of the primary key and an index where approved is the first (or only) column. In fact, your new query should be refactored as follows:
Remember, you want an index that encompasses as many DB columns as possible that are embedded in WHERE and ORDER BY clauses.
ALTER TABLE news ADD INDEX authored_time_sid_ndx (authored,time,sid);You must then refactor the query just slightly to reduce two things the MySQL Query Optimizer is expected to handle:
- the number of rows read
- the amount of data collected
Here is your original query:
SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limitWhat you can do is gather all the keys first:
SELECT sid FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limitThis will perform a full index scan rather than a full table scan because all three columns are in the index. That's a lot less baggage to walk around with for temp table generation because you are scanning just 3 columns in a smaller resource (the index) as opposed to 6 columns as from a bigger resource (the table) as well as not sticking url and bodytext (which are probably VARCHAR(300) and TEXT fields) into the mix just for the sake of gathering keys.
Next, make the SELECT sid query into an inline table and connect them back to the news table using only the fetched keys. EXAMPLE : Suppose your LIMIT variables are 200,10. This means you want to move to the 201st row of the news table and get 10 keys from that point. This means that no matter which page you are on, you want to collect 10 keys at a time, and only 10 keys at a time.
Here is the new and improved query:
SELECT A.sid, A.title, A.time, A.bodytext, A.author, A.url
FROM
news A INNER JOIN
(SELECT sid FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit) B
USING (sid);The only real cost here is doing a full index scan for keys rather than a full table scan. The benefit is that once you get the 10 keys from the inline SELECT, then only 10 titles, bodytexts, authors and urls are retrieved.
Give it a Try !!!
CAVEAT
As a rule of thumb, anytime you index a column whose cardinality is very low (i.e., Male/Female (2), Single/Married/Divorced/Widowed (4), True/False (2), 0/1 (2)) and the number of rows for any one value exceeds 5% of the total rows in the table, the MySQL Query Optimizer will rule out any and all indexes and you will perform can full table scan or a bad index gets chosen and you end up with a full index scan. This is why it is very imperative that you find the right column distribution, or at the very least perform index scans instead of table scans.
UPDATE 2011-08-08 11:13 EDT
What was I thinking ? This was my original propsed index:
ALTER TABLE news ADD INDEX authored_time_sid_ndx (authored,time,sid);The field should be approved not authored. This is what it should be:
ALTER TABLE news ADD INDEX approved_time_sid_ndx (approved,time,sid);The original query you gave was
SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limitThe index I am proposing (approved_time_sid_ndx) will include approved, time, and sid.
The answer that you just submitted has this query :
SELECT sid, title, time, bodytext, author, url FROM news WHERE approved = 1 AND sid > 0 ORDER BY sid desc LIMIT 500, 10;This being the case, the index you need now should be this:
ALTER TABLE news ADD INDEX approved_sid_ndx (approved,sid);Both approved and sid should be together. IF they are not together, the MySQL Query Optimizer may decide to perform an internal index merge of the primary key and an index where approved is the first (or only) column. In fact, your new query should be refactored as follows:
SELECT A.sid, A.title, A.time, A.bodytext, A.author, A.url
FROM
news A INNER JOIN
(SELECT sid FROM news WHERE approved=1 and sid > 0 ORDER BY sid desc LIMIT 500,10) B
USING (sid);Remember, you want an index that encompasses as many DB columns as possible that are embedded in WHERE and ORDER BY clauses.
Code Snippets
ALTER TABLE news ADD INDEX authored_time_sid_ndx (authored,time,sid);SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limitSELECT sid FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limitSELECT A.sid, A.title, A.time, A.bodytext, A.author, A.url
FROM
news A INNER JOIN
(SELECT sid FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit) B
USING (sid);ALTER TABLE news ADD INDEX authored_time_sid_ndx (authored,time,sid);Context
StackExchange Database Administrators Q#4191, answer score: 7
Revisions (0)
No revisions yet.