HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

In MySQL/MariaDB, do Indexes' performance degrade as they become larger and larger?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
largerindexesbecomemysqlperformancedegradeandtheymariadb

Problem

I'm currently exploring the use of PARTITION, for a specific use case I have.

I use InnoDB, file per table. MariaDB 10.8.

I was reading Rick's PARTITION Maintenance in MySQL webpage.

I'd like to highlight this bit:

WHERE X = 1234 -- This lets "partition pruning" look only in that one partition. But that's no better than INDEX(x) on a non-partitioned table. And you probably need that index anyway; after first 'pruning' down to the desired partition, you still need the index. No faster.

A common fallacy: "Partitioning will make my queries run faster". It won't. Ponder what it takes for a 'point query'. Without partitioning, but with an appropriate index, there is a BTree (the index) to drill down to find the desired row. For a billion rows, this might be 5 levels deep. With partitioning, first the partition is chosen and "opened", then a smaller BTree (of say 4 levels) is drilled down. Well, the savings of the shallower BTree is consumed by having to open the partition. Similarly, if you look at the disk blocks that need to be touched, and which of those are likely to be cached, you come to the conclusion that about the same number of disk hits is likely. Since disk hits are the main cost in a query, Partitioning does not gain any performance (at least for this typical case). The 2D case (below) gives the main contradiction to this discussion.

I totally understand what it means, but I have a question:

In MySQL/MariaDB, do Indexes' performance degrade as they become larger and larger?

For a billion rows, or for 100 billion rows, is a good Index always better than Partitions, in terms of Performance?

--

There is also this bit which is closest to what I'm trying to benefit:

Use case #3 -- Hot spot. This is a bit complicated to explain. Given this combination:

⚈ A table's index is too big to be cached, but the index for one partition is cacheable, and

⚈ The index is randomly accessed, and

⚈ Data ingestion would normally be I/O bound due to updating

Solution

For a billion rows, or for 100 billion rows, is a good Index always better than Partitions, in terms of Performance?

There are several things I can say about this.

-
We can't make this generalization, because it depends on the query. In general, every kind of optimization is a great help to the right type of query, at the expense of other types of queries. So you must be very specific about which query you want to optimize before choosing the method of optimization.

-
It's not an either-or choice. You can partition a table, and also define an index, so searches will be optimized in a given partition.

-
I don't think you have 100 billion rows. If you did, you wouldn't be asking this question on Stack Exchange, you'd be assigning your full-time database architect team the task of optimizing it. They would undoubtedly come back with a design that uses many servers. It's impractical to store 100 billion rows in a single table. How would you back it up? How would you add a column?

InnoDB uses B-tree indexes (also fulltext and spatial indexes, but for this discussion we assume the default type of index).

B-tree indexes have complexity O(log2n) for both inserting and searching, where n is the number of entries in the data structure. Inserting or searching therefore does get more expensive as the index gets larger.

The I/O required by an index search is a function of the depth of the B-tree. That is, how many levels of non-terminal nodes must be traversed to get to the leaf node. The depth depends on how many index entries there are, and also depends on how large the data type of a given entry, because InnoDB page sizes are fixed, so only so many index nodes can fit on a page. See: https://www.percona.com/blog/2009/04/28/the_depth_of_a_b_tree/

I/O cost can be mitigated by keeping subsets of the index pages in RAM, in the InnoDB buffer pool. But if the index grows much larger than RAM, there's not enough buffer pool to hold the whole index, so if you do searches randomly over the whole index, InnoDB is likely to evict pages that you will need again soon. Those pages will be re-loaded from storage when you need them, but this can lead to extra overhead as pages are swapped in and out of RAM.

Caching indexes only applies to MyISAM. InnoDB caches pages on demand, which may include a subset of a given index. Forget about any manual command to load indexes into cache. To be honest, I recommend to forget about MyISAM for any purpose. I haven't seen it used appropriately since the 2000's.

You asked about NVMe storage. NVMe is of course faster than old SATA interfaces, but how does it compare to RAM? It depends what you measure but for both access time and throughput (MB/second) you can count on RAM being several times faster than the latest generation of NVMe. Also the InnoDB code is written to assume that pages must be in RAM before they can be read. It's still a win to keep data and index pages cached in RAM.

I agree with Rick's general statement that partitioning is usually not going to help performance as much as you think it will. It is useful in the right scenario, but it's not a magic "everything go fast" solution. This is true of every other type of optimization too!

Context

StackExchange Database Administrators Q#321514, answer score: 10

Revisions (0)

No revisions yet.