patternsqlMinor
Treating certain Arabic characters as identical
Viewed 0 times
treatingidenticalcharactersarabiccertain
Problem
In Arabic we have characters like ا (alef) and أ (alef with hamza).
Users write them interchangeably and we want to search them interchangeably. SQL Server treats them as separate characters. How can I make SQL treat them as the same character?
I thought to replace any أ (alef with hamza) with ا (alef) at insertion but we have a lot of alternatives in Arabic language not just ا (alef) and أ (alef with hamza).
I tried
Here is a script to regenerate the issue:
The result is:
The desired result would be both of the rows we inserted.
Users write them interchangeably and we want to search them interchangeably. SQL Server treats them as separate characters. How can I make SQL treat them as the same character?
I thought to replace any أ (alef with hamza) with ا (alef) at insertion but we have a lot of alternatives in Arabic language not just ا (alef) and أ (alef with hamza).
I tried
Arabic_CI_AS and Arabic_CI_AI but that doesn't solve the problem.Here is a script to regenerate the issue:
CREATE TABLE [dbo].[TestTable] (
[ArabicChars] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[ArabicChars] ASC
)
) ON [PRIMARY];
INSERT INTO TestTable values (N'احمد');
INSERT INTO TestTable values (N'أحمد');
SELECT *
FROM TestTable
WHERE ArabicChars like N'ا%';The result is:
ArabicChars
احمد
(1 row(s) affected)The desired result would be both of the rows we inserted.
Solution
i did few tests and i guess it is a work around but can get your job done, since SQL it self isn't helping much.
if you notice that the unicodes of these characters are close to each other
so between أ and ا , its from 1571 to 1575
or if you want to make sure you get every thing in between
make sure you include from 1569 to 1575
which are
So to make sure that you include every thing similar in your search you can use regular expressions
so in this case you get all characters between ء and ا
which include all those between 1569 to 1575
so in this case if your table has
the query above will get them all.
but you will notice something funny
if you have your column as a primary key
you wont be able to insert these 2 records
because the ء,أ,إ
are all to SQL are part of hamza which is ء
So if you run the query
it will show you
so to get the long story short
to SQL أ is not = to ا because its 2 different letters
hamza and alefp
but ء = آ = أ = ؤ = إ = ئ
they are all Hamza ء
if you notice that the unicodes of these characters are close to each other
select unicode(N'أ')
= 1571
select unicode(N'ا')
= 1575
select unicode(N'إ')
= 1573so between أ and ا , its from 1571 to 1575
or if you want to make sure you get every thing in between
make sure you include from 1569 to 1575
which are
Select NCHAR(1569) = ء
Select NCHAR(1570) = آ
Select NCHAR(1571) = أ
Select NCHAR(1572) = ؤ
Select NCHAR(1573) = إ
Select NCHAR(1574) = ئ
Select NCHAR(1575) = اSo to make sure that you include every thing similar in your search you can use regular expressions
SELECT *
FROM TestTable
WHERE ArabicChars like '%[ء-ا]%'so in this case you get all characters between ء and ا
which include all those between 1569 to 1575
so in this case if your table has
CREATE TABLE [dbo].[TestTable] (
[ArabicChars] [nvarchar](50) COLLATE Arabic_CI_AI NOT NULL,
)
INSERT INTO TestTable values (N'احمد');
INSERT INTO TestTable values (N'أحمد');
INSERT INTO TestTable values (N'إحمد');the query above will get them all.
but you will notice something funny
if you have your column as a primary key
CREATE TABLE [dbo].[TestTable] (
[ArabicChars] [nvarchar](50) COLLATE Arabic_CI_AI NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[ArabicChars] ASC
)
) ON [PRIMARY];you wont be able to insert these 2 records
INSERT INTO TestTable values (N'أحمد');
INSERT INTO TestTable values (N'إحمد');
INSERT INTO TestTable values (N'ءحمد');because the ء,أ,إ
are all to SQL are part of hamza which is ء
So if you run the query
SELECT *
FROM TestTable
WHERE ArabicChars like 'ء%'it will show you
أحمد
إحمدso to get the long story short
to SQL أ is not = to ا because its 2 different letters
hamza and alefp
but ء = آ = أ = ؤ = إ = ئ
they are all Hamza ء
Code Snippets
select unicode(N'أ')
= 1571
select unicode(N'ا')
= 1575
select unicode(N'إ')
= 1573Select NCHAR(1569) = ء
Select NCHAR(1570) = آ
Select NCHAR(1571) = أ
Select NCHAR(1572) = ؤ
Select NCHAR(1573) = إ
Select NCHAR(1574) = ئ
Select NCHAR(1575) = اSELECT *
FROM TestTable
WHERE ArabicChars like '%[ء-ا]%'CREATE TABLE [dbo].[TestTable] (
[ArabicChars] [nvarchar](50) COLLATE Arabic_CI_AI NOT NULL,
)
INSERT INTO TestTable values (N'احمد');
INSERT INTO TestTable values (N'أحمد');
INSERT INTO TestTable values (N'إحمد');CREATE TABLE [dbo].[TestTable] (
[ArabicChars] [nvarchar](50) COLLATE Arabic_CI_AI NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[ArabicChars] ASC
)
) ON [PRIMARY];Context
StackExchange Database Administrators Q#14153, answer score: 6
Revisions (0)
No revisions yet.