q：在内存中将avro转换为parquet

eqfvzcg8 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(563)

我收到Kafka的avro记录。我想把这些记录转换成Parquet文件。我关注这篇博文：http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/
到目前为止，代码大致如下：

final String fileName
SinkRecord record, 
final AvroData avroData
final Schema avroSchema = avroData.fromConnectSchema(record.valueSchema());
CompressionCodecName compressionCodecName = CompressionCodecName.SNAPPY;
int blockSize = 256 * 1024 * 1024;
int pageSize = 64 * 1024;
Path path = new Path(fileName);
writer = new AvroParquetWriter<>(path, avroSchema, compressionCodecName, blockSize, pageSize);

现在，这将执行avro到parquet的转换，但是它会将parquet文件写入磁盘。我想知道是否有一种更简单的方法把文件保存在内存中，这样我就不必管理磁盘上的临时文件了。谢谢您

Java hadoop avro parquet

来源：https://stackoverflow.com/questions/39631812/q-converting-avro-to-parquet-in-memory

1条答案

按热度按时间

beq87vna1#

"but it will write the Parquet file to the disk"
"if there was an easier way to just keep the file in memory"

从您的查询中，我了解到您不想将部分文件写入parquet。如果您想将完整的文件以Parquet格式写入磁盘，并将临时文件写入内存，则可以使用内存Map文件和Parquet格式的组合。
将数据写入内存Map文件，完成写入后将字节转换为Parquet格式并存储到磁盘。
看看mappedbytebuffer。

赞(0）回复(0）举报 2021-06-02

我来回答

q：在内存中将avro转换为parquet

1条答案

相关问题

热门标签

最新问答