HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Which collation should I choose for a muiti-language website?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
websitechooselanguagemuitiforshouldwhichcollation

Problem

Does a collation have any influence over a query speed? Does the size of a table change depending of the collation?

If I want to build a website that must support all possible languages (lets take for e.g. Google) which would be the recommended collation?

I will need to store characters such as 日本語, my searches over the website will have to return something for the sóméthíng input, it must be case insensitive as well.

How do I know which is the best choice to make? Which collation better suits this case?

Solution

Generally speaking, one of the Unicode variants is probably the best for broad language support - UTF-8 is going to use less memory per codepoint, and thus will have a slight advantage in any time/space tradeoffs you find yourself in need of making; however, I think there are some of the more esoteric languages/scripts that UTF-8 cannot represent (but I'm not 100% certain of that, I haven't done an exhaustive study on the matter).

This Wikipedia article might be enlightening on the dis/advantages of each.

Context

StackExchange Database Administrators Q#255, answer score: 16

Revisions (0)

No revisions yet.