How to fix ANSII character in SQL Server table to UTF-8

hgqdbh6s  于 2023-03-28  发布在  SQL Server
关注(0)|答案(4)|浏览(166)

I have a data import process to import data from csv file into a table in SQL server.

I have noticed that some columns contain some accented characters.

For example I have noticed the following text in the database table
CAFÉ

I open a new file in Notepad++, change the encoding to ANSI and save the file with the above text.

Then change encoding to UTF-8

The result was:
CAFÉ

I am not sure what has gone wrong. But is there any way to fix this problem in the database table?

I would like to display the same CAFÉ in the database table instead of CAFÉ

Because when this column is displayed on the website even the encoding is UTF-* on web pages it still shows the string as CAFÉ instead of CAFÉ.

I have also checked the collation type of the column :
SQL_Latin1_General_CP1_CI_AS

Thanks,

u91tlkcl

u91tlkcl1#

I had a similar problem and I solved it by converting the file (which in my case was a sql script) from UTF-8 to ANSI, since SQL Server does not recognize UTF-8 encoding. It worked for me because I only needed ANSI characters.

sauutmhj

sauutmhj2#

Based on SQL - UTF-8 to varchar/nvarchar Encoding issue :

Create a custom function as follows:

CREATE FUNCTION dbo.convert_utf8(@utf8 VARBINARY(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @rslt NVARCHAR(MAX);

    SELECT @rslt=
    CAST(
          --'<?xml version="1.0" encoding="UTF-8"?><![CDATA['
        + @utf8
        --']]>'
        
    AS XML).value('.', 'nvarchar(max)');

    RETURN @rslt;
END
GO

Then update the corrupted field as follows:

update [my_table] set my_field = dbo.convert_utf8(cast(my_field as varbinary(MAX)))

I tested it in SQLServer 2019

jvidinwx

jvidinwx3#

I had the same problem and ended exporting the table using the Integration Tools to MySQL and running the following query:

UPDATE myTable Set thefield= CONVERT(BINARY CONVERT(thefieldUSING latin1) USING utf8);

This was the only way I was able to solve it.

kiayqfof

kiayqfof4#

I came up with a solution this by creating a mapping table between Expected characters and Actual characters from this webisite http://www.i18nqa.com/debug/utf8-debug.html

Once I have the mapping table then did join to my original table where Like actual characters and replace those characters with the expected ones.

UPDATE rd
SET rd.Name = REPLACE(Name, m.Actual,m.Expected)
FROM RawData rd
INNER JOIN dbo.UtfMapping m ON  rd.Name LIKE '%'+m.Actual+'%' and LEN(m.Actual) = 3;

UPDATE rd
SET rd.Name = REPLACE(Name, m.Actual,m.Expected)
FROM RawData rd
INNER JOIN dbo.UtfMapping m ON  rd.Name LIKE '%'+m.Actual+'%' and LEN(m.Actual) = 2;

UPDATE rd
SET rd.Name = REPLACE(Name, m.Actual,m.Expected)
FROM RawData rd
INNER JOIN dbo.UtfMapping m ON  rd.Name LIKE '%'+m.Actual+'%' and LEN(m.Actual) = 1;

相关问题