flume仍然保留.tmp文件,并且没有将该文件完全复制到hdfs

bxjv4tth  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(406)

嗨,我用flume复制文件从假脱机目录到hdfs使用文件作为通道。


# Component names

a1.sources = src
a1.channels = c1
a1.sinks = k1

# Source details

a1.sources.src.type = spooldir
a1.sources.src.channels = c1
a1.sources.src.spoolDir = /home/cloudera/onetrail
a1.sources.src.fileHeader = false
a1.sources.src.basenameHeader = true

# a1.sources.src.basenameHeaderKey = basename

a1.sources.src.fileSuffix = .COMPLETED
a1.sources.src.threads = 4
a1.sources.src.interceptors = newint
a1.sources.src.interceptors.newint.type = timestamp

# Sink details

a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs:///data/contentProviders/cnet/%Y%m%d/

# a1.sinks.k1.hdfs.round = false

# a1.sinks.k1.hdfs.roundValue = 1

# a1.sinks.k1.hdfs.roundUnit = second

a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream

# a1.sinks.k1.hdfs.file.Type = DataStream

a1.sinks.k1.hdfs.filePrefix = %{basename}

# a1.sinks.k1.hdfs.fileSuffix = .xml

a1.sinks.k1.threadsPoolSize = 4

# use a single file at a time

a1.sinks.k1.hdfs.maxOpenFiles = 1

# rollover file based on maximum size of 10 MB

a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.batchSize = 12

# Channel details

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /tmp/flume/checkpoint/
a1.channels.c1.dataDirs = /tmp/flume/data/

# Bind the source and sink to the channel

a1.sources.src.channels = c1
a1.sinks.k1.channels = c1

通过上面的配置,它可以将文件复制到hdfs,但我面临的问题是一个文件保持为.tmp,而不是复制完整的文件内容。
有人能帮我解决什么问题吗。

htrmnn0y

htrmnn0y1#

这个 .tmp 文件被flume“滚动”后将重命名为最终名称。
所有的滚动设置都是0,这意味着“永远保持流打开”。
将一个或多个值设置为非零值,以确定flume何时认为文件已完成,以便关闭文件并打开下一个文件。
请参阅此处有关FlumeFlume的文档以了解更多详细信息。

相关问题