flume elasticsearchsink不会消耗所有消息

vojdkbi0  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(349)

我正在使用flume处理hdfs的日志行,并使用elasticsearchsink将它们记录到elasticsearch中。
以下是我的配置:

agent.channels.memory-channel.type = memory

agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -4000 /home/cto/hs_err_pid11679.log
agent.sources.tail-source.channels = memory-channel

agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger

##### INTERCEPTORS

agent.sources.tail-source.interceptors = timestampInterceptor
agent.sources.tail-source.interceptors.timestampInterceptor.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

#### SINK

# Setting the sink to HDFS

agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/data/flume/%y-%m-%d/
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.inUsePrefix =.
agent.sinks.hdfs-sink.hdfs.rollCount = 0
agent.sinks.hdfs-sink.hdfs.rollInterval = 0
agent.sinks.hdfs-sink.hdfs.rollSize = 10000000
agent.sinks.hdfs-sink.hdfs.idleTimeout = 10
agent.sinks.hdfs-sink.hdfs.writeFormat = Text

agent.sinks.elastic-sink.channel = memory-channel
agent.sinks.elastic-sink.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elastic-sink.hostNames = 127.0.0.1:9300
agent.sinks.elastic-sink.indexName = flume_index
agent.sinks.elastic-sink.indexType = logs_type
agent.sinks.elastic-sink.clusterName = elasticsearch
agent.sinks.elastic-sink.batchSize = 500
agent.sinks.elastic-sink.ttl = 5d
agent.sinks.elastic-sink.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

# Finally, activate.

agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink elastic-sink

问题是,我在使用kibana的elastic中只看到1-2条消息,在hdfs文件中看到很多消息。
你知道我在这里遗漏了什么吗?

yiytaume

yiytaume1#

该问题与序列化程序中的错误有关。如果我们不排队:

agent.sinks.elastic-sink.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

消息被毫无问题地使用。问题在于使用序列化程序时创建@timestamp字段的方式。

相关问题