flume:目录到avro->avro到hdfs-传输后avro无效

bqf10yzr 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(324)

我有编写avro文件的用户，我想使用flume将所有这些文件移动到hdfs使用flume。所以我以后可以使用hive或pig来查询/分析数据。
在客户机上，我安装了flume，并有一个spooldir源和avro接收器，如下所示：

a1.sources = src1
a1.sinks = sink1
a1.channels = c1
a1.channels.c1.type = memory
a1.sources.src1.type = spooldir
a1.sources.src1.channels = c1
a1.sources.src1.spoolDir = {directory}
a1.sources.src1.fileHeader = true
a1.sources.src1.deserializer = avro
a1.sinks.sink1.type = avro
a1.sinks.sink1.channel = c1
a1.sinks.sink1.hostname = {IP}
a1.sinks.sink1.port = 41414

在hadoop集群上，我有一个avro源和hdfs接收器：

a1.sources = avro1
a1.sinks = sink1
a1.channels = c1
a1.channels.c1.type = memory
a1.sources.avro1.type = avro
a1.sources.avro1.channels = c1
a1.sources.avro1.bind = 0.0.0.0
a1.sources.avro1.port = 41414
a1.sinks.sink1.type = hdfs
a1.sinks.sink1.channel = c1
a1.sinks.sink1.hdfs.path = {hdfs dir}
a1.sinks.sink1.hdfs.fileSuffix = .avro
a1.sinks.sink1.hdfs.rollSize = 67108864
a1.sinks.sink1.hdfs.fileType = DataStream

问题是hdfs上的文件不是有效的avro文件！我正在使用hue ui检查文件是否是有效的avro文件。如果我上传一个avro i文件，我在我的电脑上生成的集群，我可以看到它的内容罚款。但是flume中的文件不是有效的avro文件。
我尝试了flume中包含的flume avro客户机，但是没有成功，因为它每行发送一个flume事件来破坏avro文件，这是由 spooldir 源使用 deserializer = avro . 所以我认为问题出在hdfs接收器上，当它在写文件的时候。
使用 hdfs.fileType = DataStream 它从avro字段而不是整个avro文件中写入值，从而丢失所有模式信息。如果我使用 hdfs.fileType = SequenceFile 由于某些原因，这些文件无效。
有什么想法吗？
谢谢

hadoop hdfs flume avro

来源：https://stackoverflow.com/questions/21617025/flume-directory-to-avro-avro-to-hdfs-not-valid-avro-after-transfer

1条答案

按热度按时间

06odsfpq1#

您必须将其添加到hdfs接收器配置中（此属性的值默认为 TEXT ):

a1.sinks.sink1.serializer = avro_event

这应该写入有效的avro文件，但是使用默认模式。
但是，由于您使用的是avro文件作为输入，因此您可能希望使用相同的模式编写avro文件。为此，您可以使用cloudera的cdk中的avroeventserializer。假设您构建了代码并将jar放在flume的 lib 目录中，现在可以在属性文件中定义序列化程序：

a1.sinks.sink1.serializer = org.apache.flume.serialization.AvroEventSerializer$Builder

序列化程序假定avro模式以url或文本的形式出现在每个事件的头中。要使用后一种方法（效率较低，但可能更容易尝试），必须通过添加以下属性，告诉客户端的源代码向每个事件添加模式文本：

a1.sources.src1.deserializer.schemaType = LITERAL

赞(0）回复(0）举报 2021-06-03

我来回答

flume:目录到avro->avro到hdfs-传输后avro无效

1条答案

相关问题

热门标签

最新问答