patternsqlModerate

Mysql Performance optimization, max CPU load, low qps

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

lowoptimizationmysqlmaxqpsperformanceloadcpu

Problem

I would like if anybody could look into performance optimize this:

I have a Ubuntu 12.04 running on VMWare 5.1, with 32 GB RAM and 8 Cores (No problems regarding cpu scheduling as the VM is almost alone on the host)
The hardware is IBM blade with 2xE5-2660 CPU's

Im running Mysql 5.5, and have a table looking like this:

ochrange | CREATE TABLE `ochrange` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rangestart` int(8) NOT NULL,
`rangeend` int(8) NOT NULL,
`rangelength` int(11) NOT NULL DEFAULT '1',
`networkoperator` varchar(6) COLLATE latin1_danish_ci NOT NULL,
`serviceoperator` varchar(6) COLLATE latin1_danish_ci NOT NULL,
`numbertype` varchar(6) COLLATE latin1_danish_ci NOT NULL,
`lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`lastupdateFile` varchar(255) COLLATE latin1_danish_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `rangestart_2` (`rangestart`,`rangeend`),
KEY `rangelength` (`rangelength`)
) ENGINE=MyISAM AUTO_INCREMENT=189138 DEFAULT CHARSET=latin1 COLLATE=latin1_danish_ci |

The table contains 187,500 rows.

Im running queries like this:

``


SELECT

networkoperator,numbertype

 
FROM

och.ochrange

 
WHERE '20972128' 
  BETWEEN

rangestart AND rangeend

 
ORDER BY

rangelength

 ASC  LIMIT 1;

mysql> EXPLAIN SELECT

networkoperator,numbertype

 
-> FROM

och.ochrange

 
-> WHERE '20972128' 
-> BETWEEN

rangestart AND rangeend

 
-> ORDER BY

Solution

Based on reading the query plan you've posted from the explain output, you will probably have difficulty believing the explanation of how this query is actually being processed by the server... but the explanation does illustrate why the performance would not be good.

Since you've asked for the results to be ordered by rangelength and since the B-TREE index on (rangestart,rangeend) is poorly-suited to resolving "x between y and z" expressions, the optimizer has decided to use the index on rangelength to determine an order in which it will proceed to read every row, if necessary, in the entire table (type = index), in ascending order as sorted by the index on rangelength (type = index, key = rangelength), until it finds the first row where (extra = using where) the where clause is matched. Since the rows are being read in the desired order-by, the server can stop after the first row... so I would assume this query exhibits substantial variability depending on how much of the table or index has to be scanned to resolve any particular value.

There are two approaches that come to mind to improve this.

Option 1: The first suggestion would be to add an index that includes all three values you're sorting by and selecting by... but not for the usual reasons, because the query isn't going to use it quite like that.

ALTER TABLE ochrange ADD KEY(rangelength,rangeend,rangestart);

This is still far from the ideal index for this query, but it has three advantages over what you have now:

it's already sorted by rangelength

it narrows down the number of comparisons needed, because for any given rangelength, the rangestart values that are too high can be ignored

while it isn't truly a covering index, all of the values of interest in the WHERE clause are found within the index, so the optimizer should be able to qualify or disqualify the rows based on an index scan instead of having to read the table data, and might be able to do more.

Very important note on point 3: I am not saying this index will be used to look up the matching rows, because it can't exactly be used for that. It should at least, however, be used more effectively than the current plan because it contains values we need to use for filtering and because it may also allow out of range values for rangeend to be quickly eliminated, and among the remaining values, it may allow out of range values for rangestart to also be eliminated.

I would also recommend that the where clause be written in a less-ambiguous but logically equivalent form, to potentially make things a little bit easier on the optimizer:

WHERE 20972128 
BETWEEN `rangestart` AND `rangeend` 
  ORDER BY `rangelength` ASC  LIMIT 1;

...becomes this:

WHERE rangestart = 20972128
  ORDER BY rangelength ASC  LIMIT 1;

The (rangestart,rangeend) index, at first glance, seems like it would have been more useful, but a 2-column B-Tree isn't well-suited for finding values between the low and high bounds, like this.

The residential telephone directory is a fitting analogy for a two-column index on (last_name, first_name) and illustrates why this kind of index doesn't provide as much benefit as it would seem to.

In such a directory, given the last name "Smith" and first name "John," it is easy to find all of the people named Smith, and very easy to find first name John accompanying last name Smith. It is quite impossible, though, to use the index in a telephone directory to find all of the people with first name John regardless of last name.

What we're asking of the index in this query, whether written the original way or in my suggested way, is to find all of the rows where 'rangestart' = 20972128 in the same row. This would be like trying to find all of the people in a telephone directory with a last name of Smith or any other last name that appears prior to Smith in the directory, and of those people, find those with the first name of John, or any other name that is lexically (alphabetically) "greater than" (after) John. The task would be tedious, and our only consolation is that we don't have to examine any of the pages in the directory that follow Smith, but we have to examine every entry on every preceding page before we can find what we are looking for.

Still, Option #1, adding a new index, seems worth a try. After testing with that index, it could also be worthwhile to add yet another index, this one on (rangelength,rangestart,rangeend), to see which one the optimizer prefers to use. Hopefully it will use one of them, and depending on the data in the table and the values in the query, it might alternate and it might not.

Option #2 is apparently a little bit "outside the box" in some people's minds, but it is a solution that I use for finding the specific block of IP addresses (IPv4 addresses are essentially INT UNSIGNED, with low/high boundaries) where a particular IP address lies, for geocoding. I caught some grief for suggesting this

Code Snippets

ALTER TABLE ochrange ADD KEY(rangelength,rangeend,rangestart);

WHERE 20972128 
BETWEEN `rangestart` AND `rangeend` 
  ORDER BY `rangelength` ASC  LIMIT 1;

WHERE rangestart <= 20972128 
    AND rangeend >= 20972128
  ORDER BY rangelength ASC  LIMIT 1;

SELECT ...
  FROM geo_block b
 WHERE MBRContains(b.node_polygon,POINT(in_ip_unsigned,in_ip_unsigned))

WHERE YEAR(birthday) = 1973;                                # bad
WHERE birthday >= '1973-01-01' AND birthday < '1974-01-01'; # good

Context

StackExchange Database Administrators Q#54117, answer score: 16

Revisions (0)

No revisions yet.