当我试图获取字符串的字节，但是从字符到字节的转换溢出了整数长度时会发生什么？

xtfmy6hx 于 2021-06-29 发布在 Java

关注(0)|答案(3)|浏览(473)

给定一个长度的字符串 Integer.MAX_VALUE 它包含需要一个以上字节来表示的字符，例如汉字，如果我执行，会得到什么结果 String.getBytes() ? 对于这种类型的错误有什么好的测试方法吗？

来源：https://stackoverflow.com/questions/65535585/what-happens-when-i-try-to-get-the-bytes-for-a-string-but-the-conversion-from-ch

3条答案

按热度按时间

e3bfsja21#

我要问你的问题是，你怎么能想出这样一个字符串。我找不到一个办法来造那么大的绳子。我试过的每件事都给了我一个错误，比如：

Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit

我能找到的由两个字节字符组成的最长字符串的大小（以字节为单位）只有两个字节 Integer.MAX_VALUE . 我是通过：

String foo = "\uD83D".repeat((Integer.MAX_VALUE)/2-1);

这给了你一串 1073741822 字符或 2147483644 字节。因此，我无法回答比这个更长的字符串的问题，但是当您尝试通过以下方式将其转换为字节时，这个字符串会导致错误：

byte[] blah = foo.getBytes();

您将得到错误：

Exception in thread "main" java.lang.NegativeArraySizeException: -1073741830

如果你能想出一个以字节为单位的更长的字符串，我想你不会有更好的表现。我希望这能回答你的“将会发生什么”和“你将如何测试”两个问题。
以下是我的完整测试和输出：

public class Test {
    public static void main(String[] args) {

        // Display MAX_VALUE
        System.out.println(Integer.MAX_VALUE);

        // By a bit of trial and error, build the longest two-byte character string possible with String.repeat()
        String foo = "\uD83D".repeat((Integer.MAX_VALUE)/2-1);

        // Display the number of bytes this string takes to store, which is just short of Integer.MAX_VALUE
        System.out.println(foo.length());
        System.out.println(foo.length()*2);

        // This line craps out even though the String length in bytes is less than Integer.MAX_VALUE
        byte[] blah = foo.getBytes();
    }
}

结果：

2147483647
1073741822
2147483644
Exception in thread "main" java.lang.NegativeArraySizeException: -1073741830
    at java.base/java.lang.StringCoding.encodeUTF8_UTF16(StringCoding.java:910)
    at java.base/java.lang.StringCoding.encodeUTF8(StringCoding.java:885)
    at java.base/java.lang.StringCoding.encode(StringCoding.java:489)
    at java.base/java.lang.String.getBytes(String.java:981)
    at Test.main(Test.java:15)

您应该能够捕获在字符串处理过程中可能遇到的任何异常，这些异常可能是在构建字符串而不是将其转换为字节时出现的。只要记住抓住一个 Throwable ，因为大多数错误 RuntimeExceptions s而不是 Exceptions . Throwable 两个都能抓住。

赞(0）回复(0）举报 2021-06-29

v440hwme2#

string是一个复杂的不可变类。从历史上看，它只是举行 char[] utf-16双字节字符数组。然后 String.getBytes(StandardCharsets.UTF_8) 可能确实被假定为溢出索引范围。
然而如今这根弦已经有了一个 byte[] value . 这用于压缩其他字符集中的字符串。这个问题仍然存在，例如，一个压缩的iso-8859-1字符串几乎是integer.max\u值，在utf-8中可能会爆炸（即使使用 String.toCharArray() ). 一 OutOfMemoryException .
因此，可能存在一些不同的溢出，但对于getbytes（utf-8）的utf16字符：

private static final int MAX_INDEX = Integer.MAX_VALUE;

void checkUtf8Bytes(String s) {
    if (s.length() < MAX_INDEX / 6) {
        return; // Not hurt by UTF-8 6 byte sequences.
    }
    if (s.codePoints().mapToLong(this::bytesNeeded).sum() > MAX_INDEX) {
        throw IllegalArgumentException();
    }
}

private int bytesNeeded(int codePoint) {
    if (codePoint < 128) {
        return 1;
    } else if (codePoint ...) {
    ...
}

我认为抓住outofmemoryexception比较容易。
请注意，字节中包含utf-16字符的普通字符串不能再容纳integer.max\u值/2字节。

赞(0）回复(0）举报 2021-06-29

izj3ouym3#

基于jre string类的源代码，它调用stringcoding类中的“encode”方法，该方法计算给定字符串所需的最大字节数，并以int形式返回结果。请参阅调用“scale”的“encode”方法。
因此，根据确切的结果，您将得到字符串截断（如果结果为正）或完全失败（如果结果为负）。由于我没有将逻辑深入到arrayencoder类中，因此在转换过程中也可能会出现“数组索引越界”异常。
（链接指向internet上的一些随机源代码副本，可能不是当前代码）。
这大概只是理论上的问题——一个有20亿个字符的字符串不太可能表现良好。

赞(0）回复(0）举报 2021-06-29

我来回答

当我试图获取字符串的字节，但是从字符到字节的转换溢出了整数长度时会发生什么？

3条答案

相关问题

热门标签

最新问答