From the T-SQL code below you can see that dash -
on the 3rd place of a nvarchar
string does not equal dash on 3rd place of a varchar
string.
if substring(N'BV‐D',3,1) <> substring('BV‐D',3,1)
select substring(N'BV‐D',3,1) as nchar
, substring('BV‐D',3,1) as char
, ASCII(substring(N'BV‐D',3,1)) as ncode
, ASCII(substring('BV‐D',3,1)) as code
, 'Non-Equal' as status
output:
nchar | char | ncode | code | status |
---|---|---|---|---|
‐ | - | 45 | 45 | Non-Equal |
If I will retype '-' or add N, or remove N in both cases strings will become equal. ASCII code is 45 in both cases.
Why are they not equal?
Initially it was not typed manually, but copied from different sources.
3条答案
按热度按时间bqjvbblv1#
Because they aren't equal. You have a string which contains non-ASCII charcters that you're putting into a
varchar
and then (implicitly) converting that value back to annvarchar
. By that point it's too late and information has been lost.Let's look at these values from a
varbinary
perspective:This gives the results:
Notice the third character for the first string has the representation
0x1020
. Such a character cannot be represented in avarchar
, and when you attempt to store it in one it is implicitly converted to a similar character,0x2D
. Then when you convert that back to annvarchar
you get0x2D00
. (Note, not all characters are converted to similar characters. Those that aren't are replaced with?
.)Unsurprisingly,
0x2D00
and0x1020
are not the same, and so the strings are not equal.sqyvllje2#
This is the ASCII-45 HYPHEN-MINUS. May I comment that this is the more common hyphen. It's what appears if I press the hyphen key in my US keyboard.
This is the Unicode-8208 HYPHEN, well beyong the ASCII Range. It is what your 'BV‐D' string contains. If you don't use a unicode string with
N
, sql server converts it to the similar ASCII-45 character. Still, they differ, and that's the reason you get an inequality.km0tfn4u3#
That's not the dash you assume it is, it's a different Unicode character. The query is using the wrong function though,
ASCII
, which does not return the actual Unicode code.If you try
You'll get
8208 is the Hyphen character . That's the character generated by eg HTML's
‐
, used to separate words and syllable's. It's not the same as the minus character on our keyboards.There are a lot of dash characters used in typography and hence Unicode. Full width, half width, breaking or non-breaking etc.
Fixing this
You can use TRANSLATE to map individual characters :
This trick is used in SQL Server to normalize equivalent Unicode characters because T-SQL doesn't have a NORMALIZE function. This wouldn't help here anyway, because HYPHEN-MINUS and HYPHEN aren't considered equivalent. You'd have to add all possible options though, as the example in the linked question shows :