patternsqlMinor
Why did adding a secondary index not increase select performance?
Viewed 0 times
whydidselectaddingsecondaryincreaseperformanceindexnot
Problem
I have a table in MySQL called
This table is supposed to hold chat messages between users of my application. As a result, the number of write operations on it will be high.
The application has an API that returns all the messages between two time stamps. This query will be fairly common too, but less than the number of write operations.
I ran mysqlslap with 100 concurrent connections and around 15000 rows in that table, and got a total time of around 8.6 seconds.
Then I added a secondary index on
Why did I not see a significant performance increase?
Edit:
this is what my table looks likes :
and this is how i added the index:
Typically, there will be around 150 to 350 messages returned from the select statements
messages, which looks like this : id (primary key) | description | created_at(timestamp)This table is supposed to hold chat messages between users of my application. As a result, the number of write operations on it will be high.
The application has an API that returns all the messages between two time stamps. This query will be fairly common too, but less than the number of write operations.
I ran mysqlslap with 100 concurrent connections and around 15000 rows in that table, and got a total time of around 8.6 seconds.
Then I added a secondary index on
created_at, hoping to get the results in much less time for my search between two times, but I got an increase of 0.3 seconds for the same inputs. Why did I not see a significant performance increase?
Edit:
this is what my table looks likes :
DROP TABLE IF EXISTS `mssg`;
CREATE TABLE `mssg` (
`id` INTEGER NULL AUTO_INCREMENT DEFAULT NULL,
`body` MEDIUMTEXT NULL DEFAULT NULL,
`length` VARCHAR NULL DEFAULT NULL,
`created_at` TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);and this is how i added the index:
ALTER TABLE testalter_tbl ADD INDEX (created_at);Typically, there will be around 150 to 350 messages returned from the select statements
Solution
I'll skip my comments about that
The difference between 8.6 and 8.9 seconds is negligible. For all we know, the running times of the 2 tests are almost identical. If you want to test properly, you should run longer tests and with larger size of the table (try with 15K, 150K, 1500K, 15M rows) and see whether the efficiency changes or stays the same between having or not the specific index.
You should also examine the execution plans created for the queries in the 2 cases (at various table sizes), to see what is happening behind the scenes, what indexes are used, if any, etc.
My prediction would be that as the table grows larger, the index will get more useful, as the queries will require an index seek to find the (IDs of the) rows in the wanted time range and then some additional seeks from the clustering PK index (I assume you use InnoDB.)
Without the index, a whole table scan will have to be done each time. The larger the table gets, the biggest the difference you'll see between the 2 test cases (with and without the index).
If the vast majority of your queries is similar to the following, getting rows from a small time range:
then I would suggest (assuming you use InnoDB) an alternative design: make the
This way, all the rows that your queries need will be stored in consecutive pages of the clustering (PK) index and the efficiency of the queries - as long as they ask for a small time range - will not be affected by the size of the table.
Further comments, partially related to the question;
CREATE TABLE statement that would be classified from wrong to insulting, like having all columns declared NULL (what would a row with (NULL, NULL, NULL, NULL) mean?) and concentrate on the question asked.The difference between 8.6 and 8.9 seconds is negligible. For all we know, the running times of the 2 tests are almost identical. If you want to test properly, you should run longer tests and with larger size of the table (try with 15K, 150K, 1500K, 15M rows) and see whether the efficiency changes or stays the same between having or not the specific index.
You should also examine the execution plans created for the queries in the 2 cases (at various table sizes), to see what is happening behind the scenes, what indexes are used, if any, etc.
My prediction would be that as the table grows larger, the index will get more useful, as the queries will require an index seek to find the (IDs of the) rows in the wanted time range and then some additional seeks from the clustering PK index (I assume you use InnoDB.)
Without the index, a whole table scan will have to be done each time. The larger the table gets, the biggest the difference you'll see between the 2 test cases (with and without the index).
If the vast majority of your queries is similar to the following, getting rows from a small time range:
SELECT
FROM mssg
WHERE created_at >= @start_timestamp
AND created_at < @end_timestamp ;then I would suggest (assuming you use InnoDB) an alternative design: make the
(created_at) the clustering key of the table. Since you may have 2 or more rows with exact same timestamp, you can set the primary key to (create_at, id) for this effect (as InnoDB allows only unique keys for clustering). You will also need an additional index on (id) for the AUTO_INCREMENT to be allowed. This can be declared UNIQUE or not, it's your choice:CREATE TABLE mssg (
mssg_id INTEGER NOT NULL AUTO_INCREMENT,
body MEDIUMTEXT NULL,
length VARCHAR(20) NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (created_at, mssg_id),
-- UNIQUE
INDEX mssg_id_ix (mssg_id) -- this is needed for the AutoIncrement
);This way, all the rows that your queries need will be stored in consecutive pages of the clustering (PK) index and the efficiency of the queries - as long as they ask for a small time range - will not be affected by the size of the table.
Further comments, partially related to the question;
- Why are all columns declared as
NULL? I suggest you change them all toNOT NULLexcept for those columns that there is a specific reason to allow nulls.
- Why
mediumtextfor the message? Are you going to store so large messages? It might be better for performance, if you could reduce the size, to sayVARCHAR(250).
- Why is the
lengthdeclared asVARCHARand notINTEGER?
- Further, if it's going to be storing the length of the text, it is not needed at all. You can get the length at any time using the proper text function.
- You could declare the
TIMESTAMPcolumn to get a defaultCURRENT_TIMESTAMPif you want (if it is not already provided by the application.)
Code Snippets
SELECT <columns_wanted>
FROM mssg
WHERE created_at >= @start_timestamp
AND created_at < @end_timestamp ;CREATE TABLE mssg (
mssg_id INTEGER NOT NULL AUTO_INCREMENT,
body MEDIUMTEXT NULL,
length VARCHAR(20) NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (created_at, mssg_id),
-- UNIQUE
INDEX mssg_id_ix (mssg_id) -- this is needed for the AutoIncrement
);Context
StackExchange Database Administrators Q#125477, answer score: 6
Revisions (0)
No revisions yet.