这是我的配置文件下面它以前是工作,但后来突然给错误。实际上,我要做的是将所有日志从本地移动到hdfs日志应该作为一个文件移动到hdfs,而不是作为一个片段:
# create source, channels, and sink
agent1.sources=S1
agent1.sinks=H1
agent1.channels=C1
# bind the source and sink to the channel
agent1.sources.S1.channels=C1
agent1.sinks.H1.channel=C1
# Specify the source type and directory
agent1.sources.S1.type=spooldir
agent1.sources.S1.spoolDir=/tmp/spooldir
# Specify the Sink type, directory, and parameters
agent1.sinks.H1.type=HDFS
agent1.sinks.H1.hdfs.path=/user/hive/warehouse
agent1.sinks.H1.hdfs.filePrefix=events
agent1.sinks.H1.hdfs.fileSuffix=.log
agent1.sinks.H1.hdfs.inUsePrefix=processing
A1.sinks.H1.hdfs.fileType=DataStream
# Specify the channeltyoe (Memory vs File)
agent1.channels.C1.type=file
我通过以下脚本运行代理:
flume-ng agent --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1
然后我得到这个警告:
Warning: No configuration directory set! Use --conf <dir> to override.
也
16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration for 'A1' does not contain any channels. Marking it as invalid.
16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'A1'. It will be removed.
然后重命名,创建并关闭同一个日志到hdfs,就像这样:
16/10/14 16:22:41 INFO node.Application: Starting Sink H1
16/10/14 16:22:41 INFO node.Application: Starting Source S1
16/10/14 16:22:41 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /tmp/spooldir
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: H1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: H1 started
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: S1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: S1 started
16/10/14 16:22:41 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
16/10/14 16:22:42 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561961.log.tmp to /user/hive/warehouse/events.1476476561961.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561962.log.tmp to /user/hive/warehouse/events.1476476561962.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561963.log.tmp to /user/hive/warehouse/events.1476476561963.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561964.log.tmp to /user/hive/warehouse/events.1476476561964.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561965.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561965.log.tmp
:
:
:
为什么flume总是将同一个文件写入hdfs,我如何将一个日志从本地移动到hdfs而不将它们拆分成部分,因为我的日志大小通常在50kb到300kb之间。
更新警告:
16/10/18 10:10:05 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
16/10/18 10:10:05 WARN file.ReplayHandler: Ignoring /home/USER/.flume/file-channel/data/log-18 due to EOF
java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
at org.apache.flume.channel.file.LogFileFactory.getSequentialReader(LogFileFactory.java:169)
at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:264)
at org.apache.flume.channel.file.Log.doReplay(Log.java:529)
at org.apache.flume.channel.file.Log.replay(Log.java:455)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
1条答案
按热度按时间a5g8bdjr1#
flume使用conf文件夹从中提取jre和日志属性,您可以使用
--conf
如前所述:关于…的警告
A1
是因为您的代理配置文件末尾可能有一个输入错误:A1.sinks.H1.hdfs.fileType=DataStream
应该是agent1.sinks.H1.hdfs.fileType=DataStream
至于文件-您还没有为spooldir源配置反序列化程序,默认值是line,因此您将为spooldir中的文件中的每一行获取一个hdfs文件。如果希望flume将整个文件用作单个事件,则需要使用blobdeserializer(https://flume.apache.org/flumeuserguide.html#blobdeserializer)