我有一个简单的protobuf模式,如下所示:
协议:
option java_outer_classname = "SimpleRecords";
message Record {
required int64 number = 1;
}
我使用下面的代码来生成parquet文件。
int pageSize = 4 * 1024 * 1024;
LongGenerator longGenerator = new LongGenerator(500_000_000L);
Path filePath = new Path("benchmark/numbers.parquet");
long startTime = System.nanoTime();
try (ParquetWriter<SimpleRecords.Record> writer = new ProtoParquetWriter<>(filePath, SimpleRecords.Record.class, CompressionCodecName.SNAPPY, 32*pageSize, pageSize)) {
SimpleRecords.Record.Builder recordBuilder = SimpleRecords.Record.newBuilder();
for (Long i : longGenerator) {
recordBuilder.setNumber(i);
writer.write(recordBuilder.build());
}
} catch (IOException e) {
e.printStackTrace();
}
long endTime = System.nanoTime();
我测量了生成parquet文件所需的时间,发现在版本1.8.1中需要103秒,而在版本1.11.0中需要2167秒?
暂无答案!
目前还没有任何答案,快来回答吧!