我使用clouderaflume的spooling目录源,hdfs作为接收器。我面临的序列化程序已关闭错误。我一次只复制一个文件,这发生在我使用scp复制第一个文件之后
我的代理人如下:
agentaccesscombined.sources=spooldir-accesscombinedsource
agentaccesscombined.sinks=hdfs-accesscombinedsink
agentaccesscombined.channels=chaccesscombined
# flume spooldir source
agentaccesscombined.sources.spooldir-accesscombinedsource.type=spooldir
agentaccesscombined.sources.spooldir-accesscombinedsource.spoolDir=/var/spoolAccessCombinedDir
agentaccesscombined.sources.spooldir-accesscombinedsource.ignorePattern=\\w.*.filepart
agentaccesscombined.sources.spooldir-accesscombinedsource.deletePolicy=immediate
agentaccesscombined.sources.spooldir-accesscombinedsource.fileSuffix=.SPOOL
agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeader=true
agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLineLength=70000
agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLines=10000
agentaccesscombined.sources.spooldir-accesscombinedsource.batchSize=1000
agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeaderKey=file
#flume hdfs-sink
agentaccesscombined.sinks.hdfs-accesscombinedsink.type=hdfs
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.path=hdfs://cldx-1044:1200:8020/flumeOut_spoolDir_access_combined
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollSize=12553700
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollCount=12553665
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollInterval=100000
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.fileType=DataStream
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.writeFormat=Text
agentaccesscombined.sinks.hdfs-accesscombinedsink.round = true
agentaccesscombined.sinks.hdfs-accesscombinedsink.roundValue=50
agentaccesscombined.sinks.hdfs-accesscombinedsink.roundUnit=minute
agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.idleTimeout=5
#flume channel
agentaccesscombined.channels.chaccesscombined.type=file
agentaccesscombined.channels.chaccesscombined.capacity=1000000
agentaccesscombined.channels.chaccesscombined.transactionCapacity = 1000
agentaccesscombined.channels.chaccesscombined.checkpointInterval=30000
agentaccesscombined.channels.chaccesscombined.maxFileSize=2146435071
agentaccesscombined.channels.chaccesscombined.minimumRequiredSpace=524288000
agentaccesscombined.channels.chaccesscombined.keep-alive=30
agentaccesscombined.channels.chaccesscombined.write-timeout=30
agentaccesscombined.channels.chaccesscombined.checkpoint-timeout=6000
agentaccesscombined.channels.chaccesscombined.checkpointDir=/tmp/flume/java/checkpoint_accesscombined
agentaccesscombined.channels.chaccesscombined.dataDirs=/tmp/flume/java/data_accesscombined
agentaccesscombined.sources.spooldir-accesscombinedsource.channels=chaccesscombined
agentaccesscombined.sinks.hdfs-accesscombinedsink.channel=chaccesscombined
如果我复制文件使用winscp它是正常工作,但不使用scp。请帮帮我。
提前谢谢。
2条答案
按热度按时间pkmbmrz71#
要解决眼前的问题,请重新启动flume代理。然后使用一种复制原子文件的方法。
假脱机目录源要求文件在开始读取后不要更改。如果文件发生更改,那么它将记录一条错误消息,并开始产生与上面显示的错误类似的错误。
cp
不是原子的。我不知道scp
,等等。可能复制到临时目录,然后使用mv
.acruukt92#
您可以使用winscp将文件上载到临时目录,然后通过“mv”移动到flume监控目录。mv操作是原子的。你可能需要inotify来实现自动化。