gotchaMinor
Storing special characters (German, French, Spanish) in SQL Server result in weird strings
Viewed 0 times
frenchresultserversqlweirdstringsspecialcharactersstoringspanish
Problem
I'm trying to store the German ß in SQL Server, but it results in ß.
For example "Gleimstraße" is stored as "Gleimstraße"
But this also happens for other characters:
I looked at the database collation and it shows
But I have no idea if my existing collation is causing the issue, or what I need to do to be able to store the special characters like ö, ß, é, ë, ç, ñ, í.
UPDATE 1
Indeed, I'm storing the strings in an
`
Dim myConnection As SqlClient.SqlConnection = GetConnection()
Dim cmd As New SqlClient.SqlCommand("UPDATE cities SET name=@name,updatedate=getdate() WHERE geonameid=@geonameid", myConnection)
cmd.Parameters.Add(New SqlClient.SqlParameter("@name", "Kąty Wrocławskie"))
cmd.Parameters.Add(
For example "Gleimstraße" is stored as "Gleimstraße"
But this also happens for other characters:
- "König-Karl-Straße" stored as "König-Karl-Straße"
- "Quai André-Citroën" stored as "Quai André Citroën"
- "Carrer dels Adreçadors stored as "Carrer dels Adreçadors"
I looked at the database collation and it shows
SQL_Latin1_General_CP1_CI_AS. I then Googled and found this.But I have no idea if my existing collation is causing the issue, or what I need to do to be able to store the special characters like ö, ß, é, ë, ç, ñ, í.
UPDATE 1
Indeed, I'm storing the strings in an
nvarchar column. I send the data to the database via my .NET application. This is what I have in my web.config as the connection string:`
UPDATE 2
I read values from a JSON file which I've passed into a Newtonsoft.Json.Linq.JArray and then pass it to a function, like below:
InsertStreetId(item.SelectToken("location.street").Value(Of String))
Public Shared Function InsertStreetId(ByVal street As String) As Integer
Dim streetId As Integer
Dim myConnection As SqlConnection = GetConnection()
Dim cmd As New SqlCommand("INSERT INTO geo_streets(streetname) VALUES (@streetname)" +
";SELECT CAST(scope_identity() as int);", myConnection)
cmd.Parameters.Add(New SqlParameter("@streetname", street))
Try
myConnection.Open()
streetId = CInt(cmd.ExecuteScalar)
Catch ex As Exception
Finally
myConnection.Close()
End Try
Return streetId
End Function
UPDATE 3
Ok, so here's what I did:
``Dim myConnection As SqlClient.SqlConnection = GetConnection()
Dim cmd As New SqlClient.SqlCommand("UPDATE cities SET name=@name,updatedate=getdate() WHERE geonameid=@geonameid", myConnection)
cmd.Parameters.Add(New SqlClient.SqlParameter("@name", "Kąty Wrocławskie"))
cmd.Parameters.Add(
Solution
Somehow the data is being sent over as UTF-8 bytes. Because your database collation is
You can see this for yourself by executing the following in SSMS or Visual Studio Code, etc:
One option to solve this would be to use a UTF-8 collation (i.e. one ending in
If you can't, or simply do not want to, change the database default collation to a
Update 2 of the Question indicates two possible suspects:
-
I read values from a JSON file which I've passed into a Newtonsoft.Json.Linq.JArray
and:
-
then pass it to a function:
Let's start with the second item since it's an issue even if it's not the source of this particular problem.
You are using the following constructor for
You should make that change first to see if it fixes the problem (and then fix all code that creates parameters without explicitly setting the datatype). While it will be an improvement in either case, I suspect that it won't actually solve the problem. I think that it would have to infer the type being XML in order to convert to UTF-8 encoding, and I don't believe that would happen with just a name and no actual XML in the value.
The other item of interest is reading the data from a file via Json.NET. I wouldn't be surprised if you need to explicitly set the output encoding else it defaults to UTF-8. It appears that the file is being read correctly since ß does indeed encode into UTF-8 as the bytes C3 9F.
So, if fixing how the
I highly recommend that you save the bytes of the value coming into your
I suspect you will see a value such as:
which is the UTF-16 LE representation of
SQL_Latin1_General_CP1_CI_AS, that uses code page 1252 (Latin1) for VARCHAR data. The ß character is encoded as bytes C3 9F. In code page 1252, those two bytes equate to the following characters: Ã and Ÿ.You can see this for yourself by executing the following in SSMS or Visual Studio Code, etc:
SELECT CHAR(0xC3) + CHAR(0x9F);
One option to solve this would be to use a UTF-8 collation (i.e. one ending in
_UTF8) as the database default collation, which you can do since you are using SQL Server 2019.If you can't, or simply do not want to, change the database default collation to a
_UTF8 collation, then you will need to find a way to change the encoding / culture / locale / collation of the connection to match your database code page (i.e. Windows 1252). Fortunately code page 1252 does store those characters (i.e. ö, ß, é, ë, ç, ñ, í ), but if you need to store a greater range of characters, you might need to either switch to a UTF-8 collation (if sticking with VARCHAR data), OR switch to using NVARCHAR columns / data (in which case you need to prefix your string literals, in the app, with an upper-case N, and/or specify the NVARCHAR instead of VARCHAR datatype for parameters, depending on how the data is being sent). Both of those options are Unicode, which can handle all characters. If switching to NVARCHAR data and columns (which could be a much larger change), the database connectivity software should assume the encoding to be UTF-16 Little Endian without you needing to specify that.Update 2 of the Question indicates two possible suspects:
-
I read values from a JSON file which I've passed into a Newtonsoft.Json.Linq.JArray
and:
-
then pass it to a function:
cmd.Parameters.Add(New SqlParameter("@streetname", street))
Let's start with the second item since it's an issue even if it's not the source of this particular problem.
You are using the following constructor for
SqlParameter: SqlParameter(String, Object) (which I'm guessing is the same as the infamous SqlParameterCollection.AddWithValue(String, Object) method). The problem here is that the datatype is inferred from the value, and sometimes mistakes can be made when trying to guess the datatype. It is always better to specify the datatype explicitly. And you know the datatype. So, use the following instead (be sure to adjust the size of the parameter to match the actual column if not 500):Dim param as New SqlParameter("@streetname", SqlDbType.NVarChar, 500)
param.Value = street
cmd.Parameters.Add(param)
You should make that change first to see if it fixes the problem (and then fix all code that creates parameters without explicitly setting the datatype). While it will be an improvement in either case, I suspect that it won't actually solve the problem. I think that it would have to infer the type being XML in order to convert to UTF-8 encoding, and I don't believe that would happen with just a name and no actual XML in the value.
The other item of interest is reading the data from a file via Json.NET. I wouldn't be surprised if you need to explicitly set the output encoding else it defaults to UTF-8. It appears that the file is being read correctly since ß does indeed encode into UTF-8 as the bytes C3 9F.
So, if fixing how the
SqlParameter is created doesn't fix the issue, then you need to look into setting the encoding of the JSON.I highly recommend that you save the bytes of the value coming into your
InsertStreetId method to a file for debugging, using something like:BitConverter.ToString( BitConverter.GetBytes( street ) )
I suspect you will see a value such as:
4700 6C00 6500 6900 6D00 7300 7400 7200 6100 C300 7801 6500which is the UTF-16 LE representation of
N'Gleimstraße'. If this is the case, then the issue is happening prior to the InsertStreetId method being called, which means this has nothing to do with SQL Server.Context
StackExchange Database Administrators Q#312885, answer score: 7
Revisions (0)
No revisions yet.