flume到hdfs将一个文件拆分为多个文件

ehxuflar  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(479)

我正在尝试从传输一个700 mb的日志文件 flumeHDFS . 我已配置 flume 代理人如下:

...
tier1.channels.memory-channel.type = memory
...
tier1.sinks.hdfs-sink.channel = memory-channel
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.path = hdfs://***
tier1.sinks.hdfs-sink.fileType = DataStream
tier1.sinks.hdfs-sink.rollSize = 0

来源是一个 spooldir ,频道为 memory Flume是 hdfs .
我还尝试发送一个1mb文件,flume将其拆分为1000个文件,每个文件的大小为1kb。我注意到的另一件事是传输非常慢,1mb大约需要1分钟。我做错什么了吗?

nxowjjhe

nxowjjhe1#

您还需要禁用rolltimeout,这通过以下设置完成:

tier1.sinks.hdfs-sink.hdfs.rollCount = 0
tier1.sinks.hdfs-sink.hdfs.rollInterval = 300

rollcount防止翻滚,这里的rollintervall设置为300秒,设置为0将禁用超时。您必须选择您想要的滚动机制,否则flume只会在关闭时关闭文件。
默认值如下:

hdfs.rollInterval   30  Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize   1024    File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount  10  Number of events written to file before it rolled (0 = never roll based on number of events)

相关问题