GSON将Unicode十六进制转换为UTF 8

t5zmwmid  于 2022-11-06  发布在  其他
关注(0)|答案(2)|浏览(325)

我们有一个包含以下Unicode十六进制值的json文件。

[{
    "name": "Test",
    "description": "\\0x3059\\0x3079\\0x3066\\0x306E Test \\0x30DB\\0x30B9\\0x30C8\\0x7528\\0x306E\\0x30C7\\0x30D5\\0x30A9\\0x30EB\\0x30C8 \\0x30DB\\0x30B9\\0x30C8 \\0x30B0\\0x30EB\\0x30FC\\0x30D7\\0x3002"
  }
]

在使用GSON库阅读时,我们有没有办法将这些Unicode十六进制值转换为UTF-8?
例如:
(读取器,JsonElement.class);

afdcj2ne

afdcj2ne1#

如果我没弄错的话,您的字符串是类似UTF-16的,而不是UTF-8,因为16位代码的前缀是\\0x(我仍然不明白UTF-8和Unicode的神奇之处。)一种可能的解决方案是将整个输入缓冲到字符串缓冲区中,从\\0x\u进行字符串替换,然后让Gson使用新的字符串。不确定这是不是一个好方法,因为字符串替换完全与上下文无关,可能会破坏这样的字符串。另外,字符串替换根本不关心JSON格式语法。另一个解决方案是扩展JsonReader,这样Gson就可以自己解码这样的字符串而不破坏JSON语法。
例如:

public final class DecodingJsonReader
        extends JsonReader {

    private final Function<? super CharSequence, String> nameDecoder;
    private final Function<? super CharSequence, String> stringDecoder;

    private DecodingJsonReader(@WillClose final Reader in, final Function<? super CharSequence, String> nameDecoder,
            final Function<? super CharSequence, String> stringDecoder) {
        super(in);
        this.nameDecoder = nameDecoder;
        this.stringDecoder = stringDecoder;
    }

    public static JsonReader create(@WillClose final Reader in, final Function<? super CharSequence, String> nameDecoder,
            final Function<? super CharSequence, String> stringDecoder) {
        return new DecodingJsonReader(in, nameDecoder, stringDecoder);
    }

    @Override
    public String nextName()
            throws IOException {
        return nameDecoder.apply(super.nextName());
    }

    @Override
    public String nextString()
            throws IOException {
        return stringDecoder.apply(super.nextString());
    }

}

下面的解码器对\0x-前缀数的下四个字符不是十六进制数的情况失败。索引边界检查被委托给使用中的对象。当然,这是一个很大的改进领域。它的StringBuilder被重用(不确定,但这可能会保存一些性能?)

final class ZeroExDecoder {

    private final StringBuilder stringBuilder;

    private ZeroExDecoder(final StringBuilder stringBuilder) {
        this.stringBuilder = stringBuilder;
    }

    static ZeroExDecoder create(final StringBuilder stringBuilder) {
        return new ZeroExDecoder(stringBuilder);
    }

    String decode(final CharSequence cs) {
        stringBuilder.setLength(0);
        final int length = cs.length();
        for ( int i = 0; i < length; i++ ) {
            final char p0 = cs.charAt(i);
            if ( p0 != '\\' ) {
                stringBuilder.append(p0);
                continue;
            }
            final char p1 = cs.charAt(++i);
            if ( p1 != '0' ) {
                stringBuilder.append(p1);
                continue;
            }
            final char p2 = cs.charAt(++i);
            if ( p2 != 'x' ) {
                stringBuilder.append(p2);
                continue;
            }
            final int d3 = Character.digit(cs.charAt(++i), 16) << 12;
            final int d2 = Character.digit(cs.charAt(++i), 16) << 8;
            final int d1 = Character.digit(cs.charAt(++i), 16) << 4;
            final int d0 = Character.digit(cs.charAt(++i), 16);
            stringBuilder.append((char) (d3 | d2 | d1 | d0));
        }
        return stringBuilder.toString();
    }

}

测试项目:

public final class DecodingJsonReaderTest {

    private static final Gson gson = new GsonBuilder()
            .disableHtmlEscaping()
            .disableInnerClassSerialization()
            .create();

    // immutable, thread-safe, and can be shared globally
    private static final Function<CharSequence, String> nameDecoder = CharSequence::toString;

    @Test
    public void testCreate()
            throws IOException {
        // mutable, not thread-safe, should not be shared (but can be wisely reused)
        final Function<? super CharSequence, String> stringDecoder = ZeroExDecoder.create(new StringBuilder(128))::decode;
        final StringReader reader = new StringReader("[{\"name\":\"Test\",\"description\":\"\\\\0x3059\\\\0x3079\\\\0x3066\\\\0x306E Test \\\\0x30DB\\\\0x30B9\\\\0x30C8\\\\0x7528\\\\0x306E\\\\0x30C7\\\\0x30D5\\\\0x30A9\\\\0x30EB\\\\0x30C8 \\\\0x30DB\\\\0x30B9\\\\0x30C8 \\\\0x30B0\\\\0x30EB\\\\0x30FC\\\\0x30D7\\\\0x3002\"}]\n");
        try ( final JsonReader jsonReader = DecodingJsonReader.create(reader, nameDecoder, stringDecoder) ) {
            final String description = gson.<JsonElement>fromJson(jsonReader, JsonElement.class)
                    .getAsJsonArray()
                    .get(0)
                    .getAsJsonObject()
                    .getAsJsonPrimitive("description")
                    .getAsString();
            Assertions.assertEquals("すべての Test ホスト用のデフォルト ホスト グループ。", description);
        }
    }

}

在真实的使用之前,应该首先对上面的代码进行良好的测试,并在必要时进行修复。

8gsdolmq

8gsdolmq2#

1.使用InputStreamReader以UTF-8之类的编码方式读取JSON文件

InputStream inputStream = new InputStream(new File("input.text"));
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");

1.将数据从InputStreamReader转换为Object(所需的任何类型)

Gson gson = new Gson();
Type encodedData = gson.fromJson(inputStreamReader, Type.class);

参考文献:
1.输入数据流读取器(输入数据流,%20java.nio.charset.charset)

  1. https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.5/com/google/gson/Gson.html#fromJson-java.lang.String-java.lang.reflect.Type-

相关问题