gotchasqlModerate
Why does MySQL not have hash indices on MyISAM or InnoDB?
Viewed 0 times
indiceswhyinnodbhashmysqldoesmyisamnothave
Problem
I have an application that will only select on equality, and I figure I should use a hash index over a btree index. Much to my dismay, hash indices are not supported on MyISAM or InnoDB. What's up with that?
Solution
Many databases don't support hash based indexes at all.
In order for a hash table to be efficient you need to know the number of rows that are likely to be present otherwise the base hash table will be far too large (many empty entries, wasting space and potentially disk IO) or too small meaning that indirection is often used (possibly multiple levels of indirection, or even worse if the hash implementation is single-level you could end up performing a linear search over a fair number of records) at which point things are probably no more efficient then a tree based index anyway.
So to be generally useful (i.e. usually better than the alternative) the index needs to be rebuilt occasionally as data grows (and shrinks) which could add a significant intermittent overhead. This is usually fine with memory based tables as the rebuild is probably going to be pretty fast (as the data is always going to be in RAM and is not likely to be massive in any case), but rebuilding a large index on disk is a very heavy operation (and IIRC mySQL doesn't support live index rebuilds so holds a table lock during the operation).
Hence hash indexes are used in memory tables as there they are generally better performers, but disk based tables don't support them as they could be a detriment to performance not a bonus. There is nothing to stop hash indexes being made available for disk based tables of course, no doubt some databases do support the feature, but presumably they are not implemented in ISAM/InnoDB tables as the maintainers do not consider the feature worth adding (as the extra code to write and maintain is not worth the benefit in those few circumstances that it makes a significant difference). Perhaps if you strongly disagree you could talk to them and make a good case for the implementation of the feature.
If you are indexing large strings then implementing your own pseudo-hash index (by storing a hash of the value as well as the actual value, and indexing that has column) may work, but this is only definitely more efficient for large strings (where computing the hash value and searching the tree index by this value is always likely to be faster then just searching a tree index using the larger values for comparison, and the extra storage used is not going to be significant) so do some performance analysis before implementing this in production.
In order for a hash table to be efficient you need to know the number of rows that are likely to be present otherwise the base hash table will be far too large (many empty entries, wasting space and potentially disk IO) or too small meaning that indirection is often used (possibly multiple levels of indirection, or even worse if the hash implementation is single-level you could end up performing a linear search over a fair number of records) at which point things are probably no more efficient then a tree based index anyway.
So to be generally useful (i.e. usually better than the alternative) the index needs to be rebuilt occasionally as data grows (and shrinks) which could add a significant intermittent overhead. This is usually fine with memory based tables as the rebuild is probably going to be pretty fast (as the data is always going to be in RAM and is not likely to be massive in any case), but rebuilding a large index on disk is a very heavy operation (and IIRC mySQL doesn't support live index rebuilds so holds a table lock during the operation).
Hence hash indexes are used in memory tables as there they are generally better performers, but disk based tables don't support them as they could be a detriment to performance not a bonus. There is nothing to stop hash indexes being made available for disk based tables of course, no doubt some databases do support the feature, but presumably they are not implemented in ISAM/InnoDB tables as the maintainers do not consider the feature worth adding (as the extra code to write and maintain is not worth the benefit in those few circumstances that it makes a significant difference). Perhaps if you strongly disagree you could talk to them and make a good case for the implementation of the feature.
If you are indexing large strings then implementing your own pseudo-hash index (by storing a hash of the value as well as the actual value, and indexing that has column) may work, but this is only definitely more efficient for large strings (where computing the hash value and searching the tree index by this value is always likely to be faster then just searching a tree index using the larger values for comparison, and the extra storage used is not going to be significant) so do some performance analysis before implementing this in production.
Context
StackExchange Database Administrators Q#2817, answer score: 18
Revisions (0)
No revisions yet.