HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

How to design indexes for columns with NULL values in MySQL?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnswithnulldesignindexesmysqlforhowvalues

Problem

I have a database with 40 million entries and want to run queries with the following WHERE clause

...
WHERE
  `POP1` IS NOT NULL 
  && `VT`='ABC'
  && (`SOURCE`='HOME')
  && (`alt` RLIKE '^[AaCcGgTt]

POP1 is a float column that can also be NULL. POP1 IS NOT NULL should exclude about 50% of the entries, that's why I put it at the beginning. All other terms reduce the number only marginally.

Amongst others, I designed an index pop1_vt_source, which seems to be not used, while an index with vt as first column is used. EXPLAIN-output:

| id | select_type | table | type | possible_keys                          | key                 | key_len | ref         | rows     | Extra       |
|  1 | SIMPLE      | myTab | ref  | vt_source_pop1_pop2,pop1_vt_source,... | vt_source_pop1_pop2 | 206     | const,const | 20040021 | Using where |


Why is the index with pop1 as the first column not used? Because of the NOT or because of NULL in general. How can I improve the design of my indices and WHERE clauses? Even when limiting to 10 entries, the query takes more than 30 seconds, although the first 100 entries in the table should contain the 10 matches.) && (`ref` RLIKE '^[AaCcGgTt]

POP1 is a float column that can also be NULL. POP1 IS NOT NULL should exclude about 50% of the entries, that's why I put it at the beginning. All other terms reduce the number only marginally.

Amongst others, I designed an index pop1_vt_source, which seems to be not used, while an index with vt as first column is used. EXPLAIN-output:

%%CODEBLOCK_1%%

Why is the index with pop1 as the first column not used? Because of the NOT or because of NULL in general. How can I improve the design of my indices and WHERE clauses? Even when limiting to 10 entries, the query takes more than 30 seconds, although the first 100 entries in the table should contain the 10 matches.) && (`AA` RLIKE '^[AaCcGgTt]

POP1 is a float column that can also be NULL. POP1 IS NOT NULL should exclude about 50% of the entries, that's why I put it at the beginning. All other terms reduce the number only marginally.

Amongst others, I designed an index pop1_vt_source, which seems to be not used, while an index with vt as first column is used. EXPLAIN-output:

%%CODEBLOCK_1%%

Why is the index with pop1 as the first column not used? Because of the NOT or because of NULL in general. How can I improve the design of my indices and WHERE clauses? Even when limiting to 10 entries, the query takes more than 30 seconds, although the first 100 entries in the table should contain the 10 matches.) && (`ref` = `AA` || `alt` = `AA`) LIMIT 10 ;


POP1 is a float column that can also be NULL. POP1 IS NOT NULL should exclude about 50% of the entries, that's why I put it at the beginning. All other terms reduce the number only marginally.

Amongst others, I designed an index pop1_vt_source, which seems to be not used, while an index with vt as first column is used. EXPLAIN-output:

%%CODEBLOCK_1%%

Why is the index with pop1 as the first column not used? Because of the NOT or because of NULL in general. How can I improve the design of my indices and WHERE clauses? Even when limiting to 10 entries, the query takes more than 30 seconds, although the first 100 entries in the table should contain the 10 matches.

Solution

It is the NOT NULL:

CREATE TEMPORARY TABLE `myTab` (`notnul` FLOAT, `nul` FLOAT);
INSERT INTO `myTab` VALUES (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2);
SELECT * FROM `myTab`;


gives:

+--------+------+
| notnul | nul  |
+--------+------+
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
+--------+------+


Create the index:

CREATE INDEX `notnul_nul` ON `myTab` (`notnul`, `nul`);
CREATE INDEX `nul_notnul` ON `myTab` (`nul`, `notnul`);

SHOW INDEX FROM `myTab`;


gives:

+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| myTab |          1 | notnul_nul |            1 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | notnul_nul |            2 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            1 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            2 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+


now explain the selects.
It seems that MySQL uses the index, even if You use NOT NULL:

EXPLAIN SELECT * FROM `myTab` WHERE `notnul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
|  1 | SIMPLE      | myTab | index | notnul_nul    | notnul_nul | 10      | NULL |   12 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+

EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | myTab | range | nul_notnul    | nul_notnul | 5       | NULL |    6 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+


But, when comparing NOT NULL and NULL, it seems that MySQL preferrs other indexes when using NOT NULL. Although this does obviously not add any information. This is because MySQL interprets NOT NULL as a range as you can see in type-column. I'm not sure If there is a workaround:

``
EXPLAIN SELECT * FROM
myTab WHERE nul IS NULL && notnul=2;
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
| 1 | SIMPLE | myTab | ref | notnul_nul,nul_notnul | notnul_nul | 10 | const,const | 1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+

EXPLAIN SELECT * FROM
myTab WHERE nul` IS NOT NULL && notnul=2;
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | myTab | range | notnul_nul,nul_notnul | notnul_nul | 10 | NULL | 1 | Using where; Using index |
+----+-------------+-------+------

Code Snippets

CREATE TEMPORARY TABLE `myTab` (`notnul` FLOAT, `nul` FLOAT);
INSERT INTO `myTab` VALUES (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2);
SELECT * FROM `myTab`;
+--------+------+
| notnul | nul  |
+--------+------+
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
+--------+------+
CREATE INDEX `notnul_nul` ON `myTab` (`notnul`, `nul`);
CREATE INDEX `nul_notnul` ON `myTab` (`nul`, `notnul`);

SHOW INDEX FROM `myTab`;
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| myTab |          1 | notnul_nul |            1 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | notnul_nul |            2 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            1 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            2 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EXPLAIN SELECT * FROM `myTab` WHERE `notnul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
|  1 | SIMPLE      | myTab | index | notnul_nul    | notnul_nul | 10      | NULL |   12 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+


EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | myTab | range | nul_notnul    | nul_notnul | 5       | NULL |    6 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+

Context

StackExchange Database Administrators Q#43062, answer score: 11

Revisions (0)

No revisions yet.