patternsqlMajor
Latin1_General_BIN performance impact when changing the database default collation
Viewed 0 times
impactthelatin1_general_bincollationdatabasedefaultperformancewhenchanging
Problem
I have set the database collation to
Latin1_General_BIN, to make string comparisons case-sensitive. Will this have an impact on performance? Will it have any impact on DML or DDL operations in the database? The database already exists with tables in it.Solution
Collations in SQL Server determine the rules for matching and sorting character data. Normally, you would choose a collation first based on the comparison semantics and sorting order the consumers of the data require.
Humans generally do not find that binary collations produce the sorting and comparison behaviours they expect. So, although these offer the best performance (especially the pure code-point BIN2 versions) most implementations do not use them.
Next in raw performance terms (but only for non-Unicode strings) are the backward-compatibility SQL collations. When working with Unicode data, these collations use a Windows collation instead, with the same performance characteristics. There are subtle traps here, so you need to have good reasons to choose a SQL collation these days (unless working on a US system, where it is still the default).
Windows collations are the slowest, in general, because of the complex Unicode comparison and sorting rules. Nevertheless, these offer complete compatibility with Windows in SQL Server, and are regularly maintained to keep up with changes in the Unicode standard. For modern use that includes Unicode data, a Windows collation is generally recommended.
TL;DR
If all you want is case-sensitive comparison and sorting semantics, you should choose the
You can see these definitions using sys.fn_helpcollations
Examples
Four tables that are exactly the same except for the collation; one binary, one case-sensitive, one case-insensitive, and one SQL case-sensitive:
Same sample data for each table:
Now we want to find strings greater than 'a':
Results:
Finally...
Note though, if we use a Unicode literal with the SQL collation, the implicit conversion rules result in a Windows collation comparison:
...and the SQL collation results change:
Humans generally do not find that binary collations produce the sorting and comparison behaviours they expect. So, although these offer the best performance (especially the pure code-point BIN2 versions) most implementations do not use them.
Next in raw performance terms (but only for non-Unicode strings) are the backward-compatibility SQL collations. When working with Unicode data, these collations use a Windows collation instead, with the same performance characteristics. There are subtle traps here, so you need to have good reasons to choose a SQL collation these days (unless working on a US system, where it is still the default).
Windows collations are the slowest, in general, because of the complex Unicode comparison and sorting rules. Nevertheless, these offer complete compatibility with Windows in SQL Server, and are regularly maintained to keep up with changes in the Unicode standard. For modern use that includes Unicode data, a Windows collation is generally recommended.
TL;DR
If all you want is case-sensitive comparison and sorting semantics, you should choose the
_CS_ (for Case Sensitive) variation of whichever base collation provides the expected behaviour for your users' language and culture. For example, both these are case-sensitive collations:-- Latin1-General, case-sensitive, accent-sensitive
Latin1_General_CS_AS
-- Latin1-General, case-sensitive, accent-sensitive for Unicode Data,
-- SQL Server Sort Order 51 on Code Page 1252 for non-Unicode Data
SQL_Latin1_General_CP1_CS_AS
You can see these definitions using sys.fn_helpcollations
Examples
Four tables that are exactly the same except for the collation; one binary, one case-sensitive, one case-insensitive, and one SQL case-sensitive:
CREATE TABLE #Example_BIN
(
string nvarchar(50)
COLLATE Latin1_General_BIN
NOT NULL
);
CREATE TABLE #Example_CS
(
string nvarchar(50)
COLLATE Latin1_General_CS_AI
NOT NULL
);
CREATE TABLE #Example_CI
(
string nvarchar(50)
COLLATE Latin1_General_CI_AI
NOT NULL
);
CREATE TABLE #Example_SQL
(
string varchar(50) -- Note varchar
COLLATE SQL_Latin1_General_CP1_CS_AS
NOT NULL
);Same sample data for each table:
INSERT #Example_BIN
(string)
VALUES
(N'A'),
(N'a'),
(N'B'),
(N'b'),
(N'C'),
(N'c');
INSERT #Example_CS
SELECT EB.string
FROM #Example_BIN AS EB;
INSERT #Example_CI
SELECT EB.string
FROM #Example_BIN AS EB;
INSERT #Example_SQL
SELECT EB.string
FROM #Example_BIN AS EB;Now we want to find strings greater than 'a':
SELECT EB.string AS BIN
FROM #Example_BIN AS EB
WHERE EB.string > N'a'
ORDER BY EB.string;
SELECT EC.string AS CS
FROM #Example_CS AS EC
WHERE EC.string > N'a'
ORDER BY EC.string;
SELECT EC2.string AS CI
FROM #Example_CI AS EC2
WHERE EC2.string > N'a'
ORDER BY EC2.string;
SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > 'a' -- not Unicode
ORDER BY ES.string;Results:
╔═════╗
║ BIN ║
╠═════╣
║ b ║
║ c ║
╚═════╝
╔════╗
║ CS ║
╠════╣
║ A ║
║ b ║
║ B ║
║ c ║
║ C ║
╚════╝
╔════╗
║ CI ║
╠════╣
║ B ║
║ b ║
║ C ║
║ c ║
╚════╝
╔═════╗
║ SQL ║
╠═════╣
║ B ║
║ b ║
║ C ║
║ c ║
╚═════╝Finally...
Note though, if we use a Unicode literal with the SQL collation, the implicit conversion rules result in a Windows collation comparison:
SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > N'a'
ORDER BY ES.string;...and the SQL collation results change:
╔═════╗
║ SQL ║
╠═════╣
║ A ║
║ B ║
║ b ║
║ C ║
║ c ║
╚═════╝Code Snippets
CREATE TABLE #Example_BIN
(
string nvarchar(50)
COLLATE Latin1_General_BIN
NOT NULL
);
CREATE TABLE #Example_CS
(
string nvarchar(50)
COLLATE Latin1_General_CS_AI
NOT NULL
);
CREATE TABLE #Example_CI
(
string nvarchar(50)
COLLATE Latin1_General_CI_AI
NOT NULL
);
CREATE TABLE #Example_SQL
(
string varchar(50) -- Note varchar
COLLATE SQL_Latin1_General_CP1_CS_AS
NOT NULL
);INSERT #Example_BIN
(string)
VALUES
(N'A'),
(N'a'),
(N'B'),
(N'b'),
(N'C'),
(N'c');
INSERT #Example_CS
SELECT EB.string
FROM #Example_BIN AS EB;
INSERT #Example_CI
SELECT EB.string
FROM #Example_BIN AS EB;
INSERT #Example_SQL
SELECT EB.string
FROM #Example_BIN AS EB;SELECT EB.string AS BIN
FROM #Example_BIN AS EB
WHERE EB.string > N'a'
ORDER BY EB.string;
SELECT EC.string AS CS
FROM #Example_CS AS EC
WHERE EC.string > N'a'
ORDER BY EC.string;
SELECT EC2.string AS CI
FROM #Example_CI AS EC2
WHERE EC2.string > N'a'
ORDER BY EC2.string;
SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > 'a' -- not Unicode
ORDER BY ES.string;╔═════╗
║ BIN ║
╠═════╣
║ b ║
║ c ║
╚═════╝
╔════╗
║ CS ║
╠════╣
║ A ║
║ b ║
║ B ║
║ c ║
║ C ║
╚════╝
╔════╗
║ CI ║
╠════╣
║ B ║
║ b ║
║ C ║
║ c ║
╚════╝
╔═════╗
║ SQL ║
╠═════╣
║ B ║
║ b ║
║ C ║
║ c ║
╚═════╝SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > N'a'
ORDER BY ES.string;Context
StackExchange Database Administrators Q#110911, answer score: 28
Revisions (0)
No revisions yet.