HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Seemingly identical WHERE clauses returning different results

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whereidenticaldifferentreturningseeminglyresultsclauses

Problem

I have a table named [User]. Below is a simple query to find one user from the table:

SELECT  [Username]
      ,[FirstName]
      ,[LastName]
      ,[EmailAddress]
      ,[IsStaff]
      ,[ExternalID]
      ,[LastLogin]
FROM [dbo].[User]
WHERE [Username] = 'pskalhaq' -- 1
--WHERE [Username] = '‏pskalhaq' -- 2


When I run this query I get a single record back, which is what I expect. When I comment out the WHERE clause I've labelled as "1" and uncomment the clause labelled as "2", I get no results. This is bizarre because the values between the inverted commas are identical.

I discovered this problem because a web application I've written which uses Entity Framework to get a record from another table, [Appointment], was throwing an error due to there not being a related record in the [User] table, even though the [Appointment] table contains a non-nullable foreign key joining to the [User] table. This seemed impossible. I ran the equivalent query in SSMS (2892 is the ID of the appointment record that was giving me trouble):

SELECT a.[ID] AS [AppointmentID]
    ,a.[StudentUsername] -- This is the foreign key in the Appointment table
    ,u.[Username] -- This is the primary key in the User table
FROM [dbo].[Appointment] a
JOIN [dbo].[User] u 
ON u.[Username] = a.[StudentUsername]
WHERE a.[ID] = 2892


This returned a result, as expected. So the [Appointment] record DOES have a linked [User] record. At this point I decided this must be a bug in Entity Framework (spoiler: It seems not to be, that's why I'm posting here and not on Stack Overflow!).

But bizarrely, if I copy the value from either the [StudentUsername] or the [Username] column in the results of the query and paste the value into the WHERE clause in my first query above, I get no results from the [User] table. The same is true if I manually type the username into the query.

At this point I thought I was losing my mind, so I ran a query on the [User] table to retur

Solution

They are different strings.

select cast(N'pskalhaq' as varbinary(200)) 
union all
select cast(N'‏pskalhaq' as varbinary(200))


outputs

0x700073006B0061006C00680061007100
0x0F20700073006B0061006C00680061007100


The second one has U+200F* the unicode RIGHT-TO-LEFT MARK as the first character, so it will not compare as equal in a binary collation, and perhaps in other collations too.

And the strings also don't compare equal when converted to varchar, as in this query where the string is a non-unicode literal. So the unicode characters in the query text will be converted to the database collation's code page as a varchar.

select '‏pskalhaq'


outputs

?pskalhaq


U+200F is mapped to 0x3F, the ordinary "?" in the single-byte Latin code page. Then before comparing against an nvarchar column the varchar value must be converted to nvarchar according to the rules of Data Type Precedence.

And when converted to nvarchar the "?" stays a "?", and the string becomes N'?pskalhaq' which doesn't match either N'pskalhaq' or N'‏pskalhaq'.

*0x0F20 = U+200F because the characters are stored in little-endian byte order.

Code Snippets

select cast(N'pskalhaq' as varbinary(200)) 
union all
select cast(N'‏pskalhaq' as varbinary(200))
0x700073006B0061006C00680061007100
0x0F20700073006B0061006C00680061007100
select '‏pskalhaq'

Context

StackExchange Database Administrators Q#299410, answer score: 23

Revisions (0)

No revisions yet.