HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

search misses indexed column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
missesindexedsearchcolumn

Problem

I have a table like this

CREATE TABLE `agentpositions` (
    `idPosition` INT(11) NOT NULL AUTO_INCREMENT,
    `idAgent` INT(11) NULL DEFAULT NULL,
    `hardwareSerial` VARCHAR(128) NOT NULL,
    `LatitudePoint` DECIMAL(18,12) NOT NULL,
    `LongitudePoint` DECIMAL(18,12) NOT NULL,
    `PointDate` DATE NULL DEFAULT NULL,
    `TimeStamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    ...
    PRIMARY KEY (`idPosition`),
    INDEX `idAgent` (`idAgent`),
    INDEX `idagent_pointdate` (`idAgent`, `PointDate`) -- next one is getting ignored and this one is getting used
    INDEX `pointdate_idagent` (`PointDate`, `idAgent`) -- this is getting ignored
)
ENGINE=InnoDB


we query this table with both (idAgent,TimeStamp) and because this table is going to get very big, we put index on both of them

but for sake of architecture (we think date-int is faster and more optimized than timestamp-int, is that event true?? ) we avoided putting index on timestamp, instead we added another field which stores only date of the timestamp, and added index on that, but I don't know why it ignores my index

my query looked like this:

SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime 
FROM `agentpositions` `t` IGNORE INDEX(`idAgent`, PRIMARY)
WHERE (
   t.idAgent=880 AND t.PointDate >= "2015-04-04" 
   AND (t.TimeStamp >= "2015-04-04 00:00:00" and t.TimeStamp < "2015-04-04 23:59:59")) 
   AND t.LatitudePoint != 0 and t.LongitudePoint != 0  
ORDER BY t.TimeStamp


so after not identifying the index, I managed to change it to this which uses that index!

SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
FROM agentpositions t IGNORE INDEX(idAgent, PRIMARY)
WHERE
t.idAgent=880 AND t.PointDate in ("2015-04-04","2015-04-05","2015-04-03" )
AND t.LatitudePoint != 0 and t.LongitudePoint != 0
order by t.TimeStamp


I want to be able to give time range like >='2014-01-01'

what is wrong with this index? how

Solution

This is a nice a example to explain how the query optimizer works, because in my test server, with a couple or rows, your first query uses an index (look at the usage of EXPLAIN):

root@127.0.0.1(test)> EXPLAIN SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime 
    -> FROM `agentpositions` `t` IGNORE INDEX(`idAgent`, PRIMARY)
    -> WHERE (
    ->    t.idAgent=880 AND t.PointDate >= "2015-04-04" 
    ->    AND (t.TimeStamp >= "2015-04-04 00:00:00" and t.TimeStamp     AND t.LatitudePoint != 0 and t.LongitudePoint != 0  
    -> ORDER BY t.TimeStamp\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: idagent_pointdate
          key: idagent_pointdate
      key_len: 9
          ref: NULL
         rows: 1
     filtered: 7.11
        Extra: Using index condition; Using where; Using filesort
1 row in set, 1 warning (0.00 sec)


While the second doesn't:

root@127.0.0.1(test)> EXPLAIN SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
    -> FROM agentpositions t IGNORE INDEX(idAgent, PRIMARY)
    -> WHERE
    -> t.idAgent=880 AND t.PointDate in ("2015-04-04","2015-04-05","2015-04-03" )
    -> AND t.LatitudePoint != 0 and t.LongitudePoint != 0
    -> order by t.TimeStamp\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: ALL
possible_keys: idagent_pointdate
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
     filtered: 38.40
        Extra: Using where; Using filesort
1 row in set, 1 warning (0.01 sec)


The main reason why an index gets or not used is, first- if it can be used (this happens for both cases, as you can see on the possible_keys row, and second, if it considers using the key the best strategy to retrieving the rows (based on the cardinality of the columns).

I have shown this graph many times:

If the filter that can be used for an index is not very selective (for example, the range is too large), the optimizer will prefer to use a full table scan, as it is a faster strategy for reading a large number of rows. If you want to know the details for your particular query, and youa re using MySQL 5.6, you can use the OPTIMIZER_TRACE feature to understand why one plan was preferred over another.

If you believe a discarded plan is better, you can force the usage of an index by using FORCE INDEX(idagent_pointdate). Be careful because manually forcing the usage of an index is a dangerous practice, as if the data cardinality changes and the usage of the index is no longer the best plan, you will continue blindly using it.

Aside from your main question, there are many other issues regarding your query and indexes, that I will try to address here:

-
Your idAgent index is redundant- you can use idagent_pointdate for everything that you can do with that index, so it is useless, and will take away resources (memory, disk) but also slow down certain tasks like the query optimizer having to take it into account and writes (as an additional index has to be updated as well). Use tools like pt-duplicate-key-checker to search for redundant indexes on your database.

-
I do not get well what you mean with date-int and timestamp-int, but if you mean the practicality of having what basically is having redundant information, one storing second precision and another storing days: that may a useful physical optimization of the table IF you need to index days instad of timestamps. If you just need to index the timestamp using ranges, like t.TimeStamp >= "2015-04-04" and t.TimeStamp

-
You will be unable to use 2 range accesses to two separate columns for the same index. Using
IN + a list of values is a legit workaround for this limitation (which is intrinsic of B(+)TREE indexes). This may be your main issue here.

-
The use of the range cancels the possibility (in this particular case) of using the index for sorting- this cannot be workaround -either you use the index for filtering or for sorting.

Update: Because of B(+)TREE indexes, you can only use the first range for a multi-column index. I have another drawing of why here:

-
A composite index is just a BTREE ordered by the concatenated values:

-
The index with the reverse column order cannot be used in such an effective way because of the range at the start:

Technically, both indexes would be equally good as the should be of similar size, but for this particular query, the
idagent_pointdate (idAgent, PointDate) is preferred because it has the range at the end, while for pointdate_idagent (PointDate, idAgent`), only the first row can be used.

Code Snippets

root@127.0.0.1(test)> EXPLAIN SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime 
    -> FROM `agentpositions` `t` IGNORE INDEX(`idAgent`, PRIMARY)
    -> WHERE (
    ->    t.idAgent=880 AND t.PointDate >= "2015-04-04" 
    ->    AND (t.TimeStamp >= "2015-04-04 00:00:00" and t.TimeStamp < "2015-04-04 23:59:59")) 
    ->    AND t.LatitudePoint != 0 and t.LongitudePoint != 0  
    -> ORDER BY t.TimeStamp\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: idagent_pointdate
          key: idagent_pointdate
      key_len: 9
          ref: NULL
         rows: 1
     filtered: 7.11
        Extra: Using index condition; Using where; Using filesort
1 row in set, 1 warning (0.00 sec)
root@127.0.0.1(test)> EXPLAIN SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
    -> FROM agentpositions t IGNORE INDEX(idAgent, PRIMARY)
    -> WHERE
    -> t.idAgent=880 AND t.PointDate in ("2015-04-04","2015-04-05","2015-04-03" )
    -> AND t.LatitudePoint != 0 and t.LongitudePoint != 0
    -> order by t.TimeStamp\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: ALL
possible_keys: idagent_pointdate
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
     filtered: 38.40
        Extra: Using where; Using filesort
1 row in set, 1 warning (0.01 sec)

Context

StackExchange Database Administrators Q#97107, answer score: 4

Revisions (0)

No revisions yet.