HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query 'optimizing' state is taking too long

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
querylongtoooptimizingstatetaking

Problem

I have a MySQL 5.6 on Amazon RDS that I'm using for testing some data archiving scripts. I'm removing oldest data based on a "updated_date" column and index. Curiously, after removing a few million rows, my script gets stuck on the initial query it does for determining data bounds.

I run a query like this:

SELECT min(updated_date) as oldest, max(updated_date) AS newest FROM `order`;


The explain command on this query shows:

'1', 'SIMPLE', NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'Select tables optimized away'


So, it's supposed to hit the index and run almost instantly, and it did when the testing started, but now, after millions of rows were removed, it gets stuck in the 'optimizing' state for several minutes.

The script is the only thing running on the database.

Any ideas on what's wrong with it? Am I supposed to do something when removing lots of rows like that? Do I have to run optimize table, even though I'm not using delete quick?

Update #1

The result from show table status like 'order':

Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
order,InnoDB,10,Compact,568037197,280,159252496384,0,180806041600,37692112896,4052226884,"2015-01-26 17:27:20",NULL,NULL,utf8_general_ci,NULL,,


The result from select count(*) from order is 618376777 rows.

Unfortunately, I can't post the whole schema here, but where it bears on the issue, the result from show create table order is:

CREATE TABLE `order` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  // 31 data columns here
  `updated_date` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),       
  KEY `ix_order_updated_date` (`updated_date`),
  // 9 indexes here
) ENGINE=InnoDB AUTO_INCREMENT=4052226884 DEFAULT CHARSET=utf8;


Update #2

By separating the min() and max() calls in two queries, I noticed only the min() query

Solution

SELECT min(updated_date) as oldest, max(updated_date) AS newest FROM `order`;


is well optimized if you have INDEX(updated_date) (which you do). It will do two probes. No need for two queries, etc.

The EXPLAIN said 'optimized away' because it could see that it did not need to do any work.

I checked with a similar table, then checked the Handler% STATUS values -- Handler_read_first and Handler_read_last incremented by 1 each. This is indicative of optimizing MIN and MAX.

Do not use OPTIMIZE TABLE it is (usually) a long wast of time.

It is a huge table.

DELETEing a million rows will queue up a lot of stuff, especially because of 10 secondary indexes. I'll bet that when you asked for MAX and MIN, it had to finish up all the pending index updates.

If that is the case, none of the suggestions will really solve the problem. Chunking the deletes (which is a good idea anyway) might have "hidden" the problem by slowing down the delete task. As @eroomydna says, the undo logs must have been huge.

On what basis are you DELETEing? If it is "purge all records older than X", then this is a cure: PARTITION BY RANGE(...) on the date (to use for purging) and DROP PARTITION to jetison old rows. This is also much cleaner (in the sense of OPTIMIZEing to defragment) than DELETEing.

Rolando, ANALYZE TABLE will recalc the stats "instantly". OPTIMIZE TABLE rebuilds (and ANALYZEs) the table but takes forever on this sized table.

Potentially valid is to copy over the useful rows, a la Rolando's #4. But only if you are jettisoning "most" of the table. I suggest that is not the case here -- "deleting a few million rows" out of 618M.

10 secondary keys is a lot. Would you care to show them; we may have suggestions on pruning the list. For this sized table, it is costly to maintain that many.

Edit -- "Change buffering"

I believe what I have described is called change buffering for DELETEs. More discussion:
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-performance-change_buffering.html
http://dev.mysql.com/doc/refman/5.6/en/innodb-performance-change_buffering.html

You could try innodb_change_buffering = none.

Code Snippets

SELECT min(updated_date) as oldest, max(updated_date) AS newest FROM `order`;

Context

StackExchange Database Administrators Q#98970, answer score: 2

Revisions (0)

No revisions yet.