后台处理目录源卡在异常中[序列化程序已关闭]

lsmepo6l  于 2021-06-03  发布在  Flume
关注(0)|答案(2)|浏览(304)

我使用clouderaflume的spooling目录源,hdfs作为接收器。我面临的序列化程序已关闭错误。我一次只复制一个文件,这发生在我使用scp复制第一个文件之后
我的代理人如下:

agentaccesscombined.sources=spooldir-accesscombinedsource
    agentaccesscombined.sinks=hdfs-accesscombinedsink
    agentaccesscombined.channels=chaccesscombined

    # flume spooldir source
    agentaccesscombined.sources.spooldir-accesscombinedsource.type=spooldir
    agentaccesscombined.sources.spooldir-accesscombinedsource.spoolDir=/var/spoolAccessCombinedDir
    agentaccesscombined.sources.spooldir-accesscombinedsource.ignorePattern=\\w.*.filepart
    agentaccesscombined.sources.spooldir-accesscombinedsource.deletePolicy=immediate
    agentaccesscombined.sources.spooldir-accesscombinedsource.fileSuffix=.SPOOL
    agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeader=true
    agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLineLength=70000
    agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLines=10000
    agentaccesscombined.sources.spooldir-accesscombinedsource.batchSize=1000
    agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeaderKey=file

    #flume hdfs-sink
    agentaccesscombined.sinks.hdfs-accesscombinedsink.type=hdfs
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.path=hdfs://cldx-1044:1200:8020/flumeOut_spoolDir_access_combined
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollSize=12553700
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollCount=12553665
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollInterval=100000
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.fileType=DataStream
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.writeFormat=Text
    agentaccesscombined.sinks.hdfs-accesscombinedsink.round = true
    agentaccesscombined.sinks.hdfs-accesscombinedsink.roundValue=50
    agentaccesscombined.sinks.hdfs-accesscombinedsink.roundUnit=minute
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.idleTimeout=5

    #flume channel 
    agentaccesscombined.channels.chaccesscombined.type=file
    agentaccesscombined.channels.chaccesscombined.capacity=1000000
    agentaccesscombined.channels.chaccesscombined.transactionCapacity = 1000
    agentaccesscombined.channels.chaccesscombined.checkpointInterval=30000
    agentaccesscombined.channels.chaccesscombined.maxFileSize=2146435071
    agentaccesscombined.channels.chaccesscombined.minimumRequiredSpace=524288000
    agentaccesscombined.channels.chaccesscombined.keep-alive=30
    agentaccesscombined.channels.chaccesscombined.write-timeout=30
    agentaccesscombined.channels.chaccesscombined.checkpoint-timeout=6000
    agentaccesscombined.channels.chaccesscombined.checkpointDir=/tmp/flume/java/checkpoint_accesscombined
    agentaccesscombined.channels.chaccesscombined.dataDirs=/tmp/flume/java/data_accesscombined

agentaccesscombined.sources.spooldir-accesscombinedsource.channels=chaccesscombined
agentaccesscombined.sinks.hdfs-accesscombinedsink.channel=chaccesscombined

如果我复制文件使用winscp它是正常工作,但不使用scp。请帮帮我。
提前谢谢。

pkmbmrz7

pkmbmrz71#

要解决眼前的问题,请重新启动flume代理。然后使用一种复制原子文件的方法。
假脱机目录源要求文件在开始读取后不要更改。如果文件发生更改,那么它将记录一条错误消息,并开始产生与上面显示的错误类似的错误。 cp 不是原子的。我不知道 scp ,等等。可能复制到临时目录,然后使用 mv .

acruukt9

acruukt92#

您可以使用winscp将文件上载到临时目录,然后通过“mv”移动到flume监控目录。mv操作是原子的。你可能需要inotify来实现自动化。

相关问题