patternsqlMinor
Varchar index - will hashing value make it faster?
Viewed 0 times
makevarcharvaluefasterwillhashingindex
Problem
I have a
QUESTIONS
I am running mysql 5.1 and using INNODB engine.
VARCHAR(1000) column in a table. It will contain strings that will not be guaranteed to be unique. I have a query that searches this column as part of a WHERE IN clause, the list of values in the IN ('...') list is going to be approx 100. The table will likely have millions of rows after the first few months. I understand that indexing this could slow down inserts and could create quite a large index.QUESTIONS
- Would it be faster to store a hash of the value as well and instead index and search on that?
- Does that even make sense if the values are not guaranteed to be unique?
- If hashing the values gives them a consistent length, would indexing that make queries faster?
I am running mysql 5.1 and using INNODB engine.
Solution
What you are asking is a little daunting. Here is why:
Would it be faster to store a hash of the value as well and instead index and search on that?
Creating a hash column and indexing sounds like a great idea. I have suggested that back on
Does that even make sense if the values are not guaranteed to be unique?
This would depend on the cardinality of that hash column. Since you said you will have millions of rows, let me express this in numerical terms:
Run
If hashing the values gives them a consistent length, would indexing that make queries faster?
I would say Yes and Perhaps at the same time. Why two answers? Indexing and using a hash column in place of a long column sounds brilliant against a MyISAM Table. You said you are using InnoDB.
When it comes to using Fixed vs Variable text, I would go with MyISAM over InnoDB
EPILOGUE
If the table is fairly-to-heavily used in Transactions, the table must stay InnoDB. You can take better advantage of your idea in MyISAM. You can go forward with the hash idea. Please make sure the PRIMARY KEY is single integer column (
Would it be faster to store a hash of the value as well and instead index and search on that?
Creating a hash column and indexing sounds like a great idea. I have suggested that back on
March 03, 2013 : Possible INDEX on a VARCHAR field in MySql (See Suggestion #3)Does that even make sense if the values are not guaranteed to be unique?
This would depend on the cardinality of that hash column. Since you said you will have millions of rows, let me express this in numerical terms:
Run
SELECT COUNT(DISTINCT hashcolumn) ... against the table. For a one million row table, this count should be greater that 20. In other words, each distinct value should have no more that 50,000 rows (5% of the table rows). Any value that has more that 50,000 rows will cause the MySQL Query Optimizer to dismiss the index from being used and make a full table scan the preferred method for that hash value.If hashing the values gives them a consistent length, would indexing that make queries faster?
I would say Yes and Perhaps at the same time. Why two answers? Indexing and using a hash column in place of a long column sounds brilliant against a MyISAM Table. You said you are using InnoDB.
When it comes to using Fixed vs Variable text, I would go with MyISAM over InnoDB
Sep 26, 2012: Choosing MyISAM over InnoDB for these project requirements; and long term options
May 10, 2011: What is the performance impact of using CHAR vs VARCHAR on a fixed-size field?
Mar 25, 2011: Performance implications of MySQL VARCHAR sizes
EPILOGUE
If the table is fairly-to-heavily used in Transactions, the table must stay InnoDB. You can take better advantage of your idea in MyISAM. You can go forward with the hash idea. Please make sure the PRIMARY KEY is single integer column (
BIGINT if you know you will exceed 2 billion rows. Otherwise, INT). I would do a major RAM upgrade and increase the InnoDB Buffer Pool size accordingly.Context
StackExchange Database Administrators Q#41405, answer score: 2
Revisions (0)
No revisions yet.