flume到hdfs将一个文件拆分为多个文件

ehxuflar 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(483)

我正在尝试从传输一个700 mb的日志文件 flume 至 HDFS . 我已配置 flume 代理人如下：

...
tier1.channels.memory-channel.type = memory
...
tier1.sinks.hdfs-sink.channel = memory-channel
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.path = hdfs://***
tier1.sinks.hdfs-sink.fileType = DataStream
tier1.sinks.hdfs-sink.rollSize = 0

来源是一个 spooldir ，频道为 memory Flume是 hdfs .
我还尝试发送一个1mb文件，flume将其拆分为1000个文件，每个文件的大小为1kb。我注意到的另一件事是传输非常慢，1mb大约需要1分钟。我做错什么了吗？

hadoop hdfs flume flume-ng

来源：https://stackoverflow.com/questions/28479673/flume-to-hdfs-split-a-file-to-lots-of-files

1条答案

按热度按时间

nxowjjhe1#

您还需要禁用rolltimeout，这通过以下设置完成：

tier1.sinks.hdfs-sink.hdfs.rollCount = 0
tier1.sinks.hdfs-sink.hdfs.rollInterval = 300

rollcount防止翻滚，这里的rollintervall设置为300秒，设置为0将禁用超时。您必须选择您想要的滚动机制，否则flume只会在关闭时关闭文件。
默认值如下：

hdfs.rollInterval   30  Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize   1024    File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount  10  Number of events written to file before it rolled (0 = never roll based on number of events)

赞(0）回复(0）举报 2021-06-04

我来回答

flume到hdfs将一个文件拆分为多个文件

1条答案

相关问题

热门标签

最新问答