后台处理目录源卡在异常中[序列化程序已关闭]

lsmepo6l  于 2021-06-03  发布在  Flume
关注(0)|答案(2)|浏览(366)

我使用clouderaflume的spooling目录源,hdfs作为接收器。我面临的序列化程序已关闭错误。我一次只复制一个文件,这发生在我使用scp复制第一个文件之后
我的代理人如下:

  1. agentaccesscombined.sources=spooldir-accesscombinedsource
  2. agentaccesscombined.sinks=hdfs-accesscombinedsink
  3. agentaccesscombined.channels=chaccesscombined
  4. # flume spooldir source
  5. agentaccesscombined.sources.spooldir-accesscombinedsource.type=spooldir
  6. agentaccesscombined.sources.spooldir-accesscombinedsource.spoolDir=/var/spoolAccessCombinedDir
  7. agentaccesscombined.sources.spooldir-accesscombinedsource.ignorePattern=\\w.*.filepart
  8. agentaccesscombined.sources.spooldir-accesscombinedsource.deletePolicy=immediate
  9. agentaccesscombined.sources.spooldir-accesscombinedsource.fileSuffix=.SPOOL
  10. agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeader=true
  11. agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLineLength=70000
  12. agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLines=10000
  13. agentaccesscombined.sources.spooldir-accesscombinedsource.batchSize=1000
  14. agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeaderKey=file
  15. #flume hdfs-sink
  16. agentaccesscombined.sinks.hdfs-accesscombinedsink.type=hdfs
  17. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.path=hdfs://cldx-1044:1200:8020/flumeOut_spoolDir_access_combined
  18. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollSize=12553700
  19. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollCount=12553665
  20. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollInterval=100000
  21. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.fileType=DataStream
  22. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.writeFormat=Text
  23. agentaccesscombined.sinks.hdfs-accesscombinedsink.round = true
  24. agentaccesscombined.sinks.hdfs-accesscombinedsink.roundValue=50
  25. agentaccesscombined.sinks.hdfs-accesscombinedsink.roundUnit=minute
  26. agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.idleTimeout=5
  27. #flume channel
  28. agentaccesscombined.channels.chaccesscombined.type=file
  29. agentaccesscombined.channels.chaccesscombined.capacity=1000000
  30. agentaccesscombined.channels.chaccesscombined.transactionCapacity = 1000
  31. agentaccesscombined.channels.chaccesscombined.checkpointInterval=30000
  32. agentaccesscombined.channels.chaccesscombined.maxFileSize=2146435071
  33. agentaccesscombined.channels.chaccesscombined.minimumRequiredSpace=524288000
  34. agentaccesscombined.channels.chaccesscombined.keep-alive=30
  35. agentaccesscombined.channels.chaccesscombined.write-timeout=30
  36. agentaccesscombined.channels.chaccesscombined.checkpoint-timeout=6000
  37. agentaccesscombined.channels.chaccesscombined.checkpointDir=/tmp/flume/java/checkpoint_accesscombined
  38. agentaccesscombined.channels.chaccesscombined.dataDirs=/tmp/flume/java/data_accesscombined
  39. agentaccesscombined.sources.spooldir-accesscombinedsource.channels=chaccesscombined
  40. agentaccesscombined.sinks.hdfs-accesscombinedsink.channel=chaccesscombined

如果我复制文件使用winscp它是正常工作,但不使用scp。请帮帮我。
提前谢谢。

pkmbmrz7

pkmbmrz71#

要解决眼前的问题,请重新启动flume代理。然后使用一种复制原子文件的方法。
假脱机目录源要求文件在开始读取后不要更改。如果文件发生更改,那么它将记录一条错误消息,并开始产生与上面显示的错误类似的错误。 cp 不是原子的。我不知道 scp ,等等。可能复制到临时目录,然后使用 mv .

acruukt9

acruukt92#

您可以使用winscp将文件上载到临时目录,然后通过“mv”移动到flume监控目录。mv操作是原子的。你可能需要inotify来实现自动化。

相关问题