patternsqlMinor
What is the performance impact of indexing a frequently updated column?
Viewed 0 times
theimpactwhatcolumnindexingfrequentlyperformanceupdated
Problem
I just want to know if I'm doing this right. I've searched google for hours already and I can't find a straight answer.
I have a table with potentially a very large amount of inserts and updates. Any time an update is made to a row, the 'modified' column gets changed. My application needs to query this table (along with some joins and filtering), and get paginated results sorted by that 'modified' column. Everything I've read tells me that I need to index the column that is used for sorting. But I've also read many articles telling me that indexing a column that gets updated frequently is a bad idea, because the index needs to be rebuilt every time the value of an indexed column changes.
So what is the proper way to go about doing this? If I index the 'modified' column, will this lead to problems later on when the index becomes very large and it needs to constantly rebuild itself over and over again? If I don't index the 'modified' column, will my pagination queries eventually become so slow that my application becomes unusable? Or is there another solution I haven't thought of?
-EDIT- I am leaning towards using an index on the column. But what I'm still not sure about is what affect a large number of UPDATEs to the indexed column will have on the SELECT queries that use that index. I know that indexes slow down UPDATEs and INSERTs, but I'm not sure if that increased I/O workload from rebuilding the index so many times (EVERY time a row gets updated, which there would probably be dozens of at any given moment) would be a major bottleneck to the server's performance in general. Or is it not as big of a bottleneck as I'm thinking?
I have a table with potentially a very large amount of inserts and updates. Any time an update is made to a row, the 'modified' column gets changed. My application needs to query this table (along with some joins and filtering), and get paginated results sorted by that 'modified' column. Everything I've read tells me that I need to index the column that is used for sorting. But I've also read many articles telling me that indexing a column that gets updated frequently is a bad idea, because the index needs to be rebuilt every time the value of an indexed column changes.
So what is the proper way to go about doing this? If I index the 'modified' column, will this lead to problems later on when the index becomes very large and it needs to constantly rebuild itself over and over again? If I don't index the 'modified' column, will my pagination queries eventually become so slow that my application becomes unusable? Or is there another solution I haven't thought of?
-EDIT- I am leaning towards using an index on the column. But what I'm still not sure about is what affect a large number of UPDATEs to the indexed column will have on the SELECT queries that use that index. I know that indexes slow down UPDATEs and INSERTs, but I'm not sure if that increased I/O workload from rebuilding the index so many times (EVERY time a row gets updated, which there would probably be dozens of at any given moment) would be a major bottleneck to the server's performance in general. Or is it not as big of a bottleneck as I'm thinking?
Solution
(Assuming InnoDB...)
The data and the
An update requires modifying a record in the data BTree. If that also involves updating a column that is in any index it requires, effectively, a
On the one hand, you can argue that
Bottom line: It is a tradeoff between
(More)
After deleting a row from one spot in the BTree and inserting a new row in (perhaps) a different block, the blocks may need some splitting, etc. But, even in the worst case, rebalancing does not require regenerating the entire index. I think it is limited to a small multiple of LogN -- for example, in the rare case when it needs to do one block split for each level all the way up to the root node. A billion rows may have only 5 levels, so we are not talking a huge overhead.
Furthermore, for non-
The data and the
PRIMARY KEY are in one BTree. Each secondary INDEX (including UNIQUE indexes) is in its separate BTree.An update requires modifying a record in the data BTree. If that also involves updating a column that is in any index it requires, effectively, a
DELETE from that BTree plus an INSERT somewhere else in that same BTree.On the one hand, you can argue that
DELETE and INSERT are designed to be efficient, and we do it all the time. On the other hand, you can see that modifying an indexed column is a 'lot' of work under the covers.Bottom line: It is a tradeoff between
SELECT efficiency and UPDATE cost. (You have not provided enough info for us to give a simple yes/no.)(More)
After deleting a row from one spot in the BTree and inserting a new row in (perhaps) a different block, the blocks may need some splitting, etc. But, even in the worst case, rebalancing does not require regenerating the entire index. I think it is limited to a small multiple of LogN -- for example, in the rare case when it needs to do one block split for each level all the way up to the root node. A billion rows may have only 5 levels, so we are not talking a huge overhead.
Furthermore, for non-
UNIQUE indexes, the index update is buffered in the "Change Buffer". This collects index updates; later they are 'batched' together to for efficient storing to disk. Meanwhile, if the index is needed, the system looks both in the Change Buffer and the actual on-disk (but cached in the buffer_pool) index. The Change Buffer is allocated as a percentage of the buffer_pool (cf innodb_change_buffer_max_size, which is usually left at 25).Context
StackExchange Database Administrators Q#138815, answer score: 4
Revisions (0)
No revisions yet.