patternsqlMinor
Full Text Search Term Casing Affects Results
Viewed 0 times
fullsearchtextaffectstermresultscasing
Problem
I have been experiencing a full text problem when searching using
The table contains following rows:
When I try to get "Microsoft C#" using the following query:
...it yields the following results:
I tried using different casing. It yields the different result set:
Result:
The second one is what I expected. I thought SQL Server performed a case insensitive search to get the data, but it seems casing affects the full text results?
CONTAINS. I have the full text catalog accent sensitive setting set to false. The table contains following rows:
Microsoft C#
Microsoft Visual C
Microsoft CWhen I try to get "Microsoft C#" using the following query:
select name
from fulltexttable ft with (nolock)
where contains (
ft.name
,N'microsoft and c#'
)...it yields the following results:
Microsoft Visual C
Microsoft CI tried using different casing. It yields the different result set:
select *
from fulltexttable with (nolock)
where contains (
name
,N'microsoft and C#'
)Result:
Microsoft C#The second one is what I expected. I thought SQL Server performed a case insensitive search to get the data, but it seems casing affects the full text results?
Solution
This is documented although I couldn't find a reference in Books Online:
The rules for characters followed by nonalphanumeric characters are
somewhat convoluted (at least in English). The English word breaker
accepts the token C# and returns C#. The lowercase token c#, however,
is indexed as c with the # character stripped off. The uppercase token
C++ and lowercase token c++, on the other hand, are both indexed as
c++.
"Pro Full-Text Search in SQL Server 2008" - Michael Coles, Hilary Cotter, p106
You can check the behaviour of your search phrase using the sys.dm_fts_parser DMV, eg
As a workaround, you could force all strings through as upper-case, use some kind of mapping table for common search terms, or design a strategy to ensure non-alphanumeric characters are never stored or queried as part of the full-text index.
Note the behaviour is more consistent in SQL 2012 (where both lower and upper case will return only 'Microsoft C#').
The rules for characters followed by nonalphanumeric characters are
somewhat convoluted (at least in English). The English word breaker
accepts the token C# and returns C#. The lowercase token c#, however,
is indexed as c with the # character stripped off. The uppercase token
C++ and lowercase token c++, on the other hand, are both indexed as
c++.
"Pro Full-Text Search in SQL Server 2008" - Michael Coles, Hilary Cotter, p106
You can check the behaviour of your search phrase using the sys.dm_fts_parser DMV, eg
SELECT * FROM sys.dm_fts_parser ( 'microsoft AND c#', 1033, 0, 0 )
SELECT * FROM sys.dm_fts_parser ( 'microsoft AND C#', 1033, 0, 0 )As a workaround, you could force all strings through as upper-case, use some kind of mapping table for common search terms, or design a strategy to ensure non-alphanumeric characters are never stored or queried as part of the full-text index.
Note the behaviour is more consistent in SQL 2012 (where both lower and upper case will return only 'Microsoft C#').
Code Snippets
SELECT * FROM sys.dm_fts_parser ( 'microsoft AND c#', 1033, 0, 0 )
SELECT * FROM sys.dm_fts_parser ( 'microsoft AND C#', 1033, 0, 0 )Context
StackExchange Database Administrators Q#73470, answer score: 8
Revisions (0)
No revisions yet.