java中字符串数据的压缩与解压缩

goqiplq2  于 2023-03-11  发布在  Java
关注(0)|答案(8)|浏览(243)

我正在使用下面的代码来压缩和解压缩字符串数据,但是我面临的问题是,它很容易被压缩而没有错误,但是解压缩方法抛出了下面的错误。
线程“main”java.io中出现异常。不是GZIP格式

public static void main(String[] args) throws Exception {
    String string = "I am what I am hhhhhhhhhhhhhhhhhhhhhhhhhhhhh"
            + "bjggujhhhhhhhhh"
            + "rggggggggggggggggggggggggg"
            + "esfffffffffffffffffffffffffffffff"
            + "esffffffffffffffffffffffffffffffff"
            + "esfekfgy enter code here`etd`enter code here wdd"
            + "heljwidgutwdbwdq8d"
            + "skdfgysrdsdnjsvfyekbdsgcu"
            + "jbujsbjvugsduddbdj";

    System.out.println("after compress:");
    String compressed = compress(string);
    System.out.println(compressed);
    System.out.println("after decompress:");
    String decomp = decompress(compressed);
    System.out.println(decomp);
}

public static String compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return str;
    }
    System.out.println("String length : " + str.length());
    ByteArrayOutputStream obj=new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    String outStr = obj.toString("UTF-8");
    System.out.println("Output String length : " + outStr.length());
    return outStr;
}

public static String decompress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return str;
    }
    System.out.println("Input String length : " + str.length());
    GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes("UTF-8")));
    BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
    String outStr = "";
    String line;
    while ((line=bf.readLine())!=null) {
        outStr += line;
    }
    System.out.println("Output String lenght : " + outStr.length());
    return outStr;
}

仍然不知道如何解决此问题!

pzfprimi

pzfprimi1#

这是因为

String outStr = obj.toString("UTF-8");

发送您可以从ByteArrayOutputStream中获取的byte[],并在ByteArrayInputStream中使用它来构造GZIPInputStream。以下是需要在代码中进行的更改。

byte[] compressed = compress(string); //In the main method

public static byte[] compress(String str) throws Exception {
    ...
    ...
    return obj.toByteArray();
}

public static String decompress(byte[] bytes) throws Exception {
    ...
    GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
    ...
}
laximzn5

laximzn52#

上面的答案解决了我们的问题,但除此之外,如果我们试图解压缩一个未压缩的(“不是zip格式”)byte[],我们将得到“不是GZIP格式”的异常消息。
为了解决这个问题,我们可以在类中添加加法代码.

public static boolean isCompressed(final byte[] compressed) {
    return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
}

包含compress/decompress的完整压缩类如下所示:

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

public class GZIPCompression {
  public static byte[] compress(final String str) throws IOException {
    if ((str == null) || (str.length() == 0)) {
      return null;
    }
    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.flush();
    gzip.close();
    return obj.toByteArray();
  }

  public static String decompress(final byte[] compressed) throws IOException {
    final StringBuilder outStr = new StringBuilder();
    if ((compressed == null) || (compressed.length == 0)) {
      return "";
    }
    if (isCompressed(compressed)) {
      final GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed));
      final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gis, "UTF-8"));

      String line;
      while ((line = bufferedReader.readLine()) != null) {
        outStr.append(line);
      }
    } else {
      outStr.append(compressed);
    }
    return outStr.toString();
  }

  public static boolean isCompressed(final byte[] compressed) {
    return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
  }
}
50pmv0ei

50pmv0ei3#

如果您需要通过网络传输压缩内容或将其存储为文本,您必须使用Base64编码器(如apache commons codec Base64)将字节数组转换为Base64字符串,并在远程客户端将字符串解码回字节数组。

jhdbpxl9

jhdbpxl94#

正确压缩和解压缩的另一个示例:

@Slf4j
public class GZIPCompression {
    public static byte[] compress(final String stringToCompress) {
        if (isNull(stringToCompress) || stringToCompress.length() == 0) {
            return null;
        }

        try (final ByteArrayOutputStream baos = new ByteArrayOutputStream();
            final GZIPOutputStream gzipOutput = new GZIPOutputStream(baos)) {
            gzipOutput.write(stringToCompress.getBytes(UTF_8));
            gzipOutput.finish();
            return baos.toByteArray();
        } catch (IOException e) {
            throw new UncheckedIOException("Error while compression!", e);
        }
    }

    public static String decompress(final byte[] compressed) {
        if (isNull(compressed) || compressed.length == 0) {
            return null;
        }

        try (final GZIPInputStream gzipInput = new GZIPInputStream(new ByteArrayInputStream(compressed));
             final StringWriter stringWriter = new StringWriter()) {
            IOUtils.copy(gzipInput, stringWriter, UTF_8);
            return stringWriter.toString();
        } catch (IOException e) {
            throw new UncheckedIOException("Error while decompression!", e);
        }
    }
}
zaq34kh6

zaq34kh65#

问题出在这一行:

String outStr = obj.toString("UTF-8");

字节数组obj包含任意的二进制数据。你不能把任意的二进制数据当作UTF-8来“解码”。如果你尝试这样做,你会得到一个不能被“编码”回字节的字符串。或者至少,你得到的字节将不同于你开始时的字节......以至于它们不再是有效的GZIP流。
修复方法是按原样存储或传输字节数组的内容。不要试图将其转换为字符串。它是二进制数据,而不是文本。

0g0grzrc

0g0grzrc6#

客户端发送一些需要压缩的消息,服务器端(Kafka)解压缩字符串消息

以下是我的示例:
压缩

public static String compress(String str, String inEncoding) {
        if (str == null || str.length() == 0) {
            return str;
        }
        try {
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            GZIPOutputStream gzip = new GZIPOutputStream(out);
            gzip.write(str.getBytes(inEncoding));
            gzip.close();
            return URLEncoder.encode(out.toString("ISO-8859-1"), "UTF-8");
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

解压缩:

public static String decompress(String str, String outEncoding) {
        if (str == null || str.length() == 0) {
            return str;
        }

        try {
            String decode = URLDecoder.decode(str, "UTF-8");

            ByteArrayOutputStream out = new ByteArrayOutputStream();
            ByteArrayInputStream in = new ByteArrayInputStream(decode.getBytes("ISO-8859-1"));
            GZIPInputStream gunzip = new GZIPInputStream(in);
            byte[] buffer = new byte[256];
            int n;
            while ((n = gunzip.read(buffer)) >= 0) {
                out.write(buffer, 0, n);
            }
            return out.toString(outEncoding);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }
dphi5xsq

dphi5xsq7#

不能将二进制数据转换为String。作为一种解决方案,可以对二进制数据进行编码,然后转换为String。例如,请看How do you convert binary data to Strings and back in Java?

rsaldnfx

rsaldnfx8#

在解压缩方法中,我们应该使用base64解码器对字节进行解码。这样做可以克服这个异常

byte[] bytes = str.getBytes("UTF-8");
bytes = Base64.deocdeBase64(bytes);

GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));

通过添加和修改解压方法中的上述行,我们可以解决这个问题

相关问题