字符串形式的文件存储的文本内容未将unicode转换为iso_8859_1

pgpifvop 于 2021-08-20 发布在 Java

关注(0)|答案(1)|浏览(339)

我正在尝试将unicode转换为iso_8859_1。

String myString = "\u00E9checs";
byte[] bytesOfString = myString.getBytes();
String encoded_String = new String(bytesOfString, StandardCharsets.ISO_8859_1);
System.out.println(encoded_String);

输出：

échecs

到目前为止还不错，但当我尝试转换保存在文件中的相同文本时，它并不是转换为打印，而是在这里附上代码，以便从文件中读取并执行转换。

String path = "st.txt"; //where st.txt contains only one line i.e. \u00E9checs
    FileInputStream inputStream = null;
    Scanner sc = null;
    try {
        inputStream = new FileInputStream(path);
        sc = new Scanner(inputStream);
        while (sc.hasNextLine()) {
            byte[] bytesOfString = sc.nextLine().getBytes();   
            String encoded_String = new String(bytesOfString, StandardCharsets.ISO_8859_1);
            System.out.println(encoded_String); 

        }

        if (sc.ioException() != null) {
            throw sc.ioException();
        }
    } finally {
        if (inputStream != null) {
            inputStream.close();
        }
        if (sc != null) {
            sc.close();
        }
    }

输出：

\u00E9checs

注意：这是一个测试代码，因此我在文件中使用一行代码；我需要在一个大文件上应用相同的过程，因为我使用scanner类来节省内存利用率。
有没有人能告诉我，当unicode直接在java字符串变量中声明时，如何获得与我相同的文件文本结果？
提前感谢您，期待您的早日回复。

Java unicode iso-8859-1

来源：https://stackoverflow.com/questions/68243617/text-content-of-file-store-in-string-not-converting-unicode-to-iso-8859-1

1条答案

按热度按时间

htzpubme1#

这就是问题所在：

byte[] bytesOfString = sc.nextLine().getBytes();
      String encoded_String = new String(bytesOfString, StandardCharsets.ISO_8859_1);

因此：
一个文件中大约有8859-1个字节
扫描仪在假定它们是unicode的情况下读取它们
然后将unicode数据转换为一些utf-8字节
然后把字节转换成unicode，假装它们是8859-1
您应该使用需要8859-1输入的扫描仪：

new Scanner(inputstream, StandardCharsets.ISO_8859_1);

然后nextline将进行正确的转换；不需要更多的代码杂耍。

赞(0）回复(0）举报 2021-08-20

我来回答

字符串形式的文件存储的文本内容未将unicode转换为iso_8859_1

1条答案

相关问题

热门标签

最新问答